subject:"optimizing use of sstableloader \/ SSTableSimpleUnsortedWriter"

Re: optimizing use of sstableloader / SSTableSimpleUnsortedWriter

2012-08-28 Thread aaron morton

dataset... just under 4 months of data is less then 2GB! I'm pretty
thrilled.
Be thrilled by all the compressions ! :)

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/08/2012, at 6:10 AM, Aaron Turner synfina...@gmail.com wrote:

On Mon, Aug 27, 2012 at 1:19 AM, aaron morton aa...@thelastpickle.com wrote:
After thinking about how
sstables are done on disk, it seems best (required??) to write out
each row at once.

Sort of. We only want one instance of the row per SSTable created.

Ah, good clarification, although I think for my purposes they're one
in the same.

Any other tips to improve load time or reduce the load on the cluster
or subsequent compaction activity?

Less SSTables means less compaction. So go as high as you can on the
bufferSizeInMB param for the
SSTableSimpleUnsortedWriter.

Ok.

There is also a SSTableSimpleWriter. Because it expects rows to be ordered
it does not buffer and can create bigger sstables.
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableSimpleWriter.java

Hmmm prolly not realistic in my situation... doing so would likely
thrash the disks on my PG server a lot more and kill my read
throughput and that server is already hitting a wall.

Right now my Cassandra data store has about 4 months of data and we
have 5 years of historical

ingest all the histories!

Actually, I was a little worried about how much space that would
take... my estimates was ~305GB/year, which is a lot when you consider
the 300-400GB/node limit (something I didn't know about at the time).
However, compression has turned out to be extremely efficient on my
dataset... just under 4 months of data is less then 2GB! I'm pretty
thrilled.

--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix
Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
carpe diem quam minimum credula postero

Re: optimizing use of sstableloader / SSTableSimpleUnsortedWriter

2012-08-27 Thread aaron morton

After thinking about how
sstables are done on disk, it seems best (required??) to write out
each row at once.
Sort of. We only want one instance of the row per SSTable created.

Any other tips to improve load time or reduce the load on the cluster
or subsequent compaction activity?

Less SSTables means less compaction. So go as high as you can on the
bufferSizeInMB param for the
SSTableSimpleUnsortedWriter.

There is also a SSTableSimpleWriter. Because it expects rows to be ordered it
does not buffer and can create bigger sstables.
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableSimpleWriter.java

Right now my Cassandra data store has about 4 months of data and we
have 5 years of historical
ingest all the histories!

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/08/2012, at 12:56 PM, Aaron Turner synfina...@gmail.com wrote:

So I've read: http://www.datastax.com/dev/blog/bulk-loading

Are there any tips for using sstableloader /
SSTableSimpleUnsortedWriter to migrate time series data from a our old
datastore (PostgreSQL) to Cassandra? After thinking about how
sstables are done on disk, it seems best (required??) to write out
each row at once. Ie: if each row == 1 years worth of data and you
have say 30,000 rows, write one full row at a time (a full years worth
of data points for a given metric) rather then 1 data point for 30,000
rows.

Any other tips to improve load time or reduce the load on the cluster
or subsequent compaction activity? All my CF's I'll be writing to
use compression and leveled compaction.

Right now my Cassandra data store has about 4 months of data and we
have 5 years of historical (not sure yet how much we'll actually load
yet, but minimally 1 years worth).

Thanks!

Re: optimizing use of sstableloader / SSTableSimpleUnsortedWriter

2012-08-27 Thread Aaron Turner

On Mon, Aug 27, 2012 at 1:19 AM, aaron morton aa...@thelastpickle.com wrote:
After thinking about how
sstables are done on disk, it seems best (required??) to write out
each row at once.

Sort of. We only want one instance of the row per SSTable created.

Ah, good clarification, although I think for my purposes they're one
in the same.

Any other tips to improve load time or reduce the load on the cluster
or subsequent compaction activity?

Less SSTables means less compaction. So go as high as you can on the
bufferSizeInMB param for the
SSTableSimpleUnsortedWriter.

Ok.

Hmmm prolly not realistic in my situation... doing so would likely
thrash the disks on my PG server a lot more and kill my read
throughput and that server is already hitting a wall.

Right now my Cassandra data store has about 4 months of data and we
have 5 years of historical

ingest all the histories!

--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
carpe diem quam minimum credula postero

optimizing use of sstableloader / SSTableSimpleUnsortedWriter

2012-08-24 Thread Aaron Turner

So I've read: http://www.datastax.com/dev/blog/bulk-loading

Are there any tips for using sstableloader /
SSTableSimpleUnsortedWriter to migrate time series data from a our old
datastore (PostgreSQL) to Cassandra?  After thinking about how
sstables are done on disk, it seems best (required??) to write out
each row at once.  Ie: if each row == 1 years worth of data and you
have say 30,000 rows, write one full row at a time (a full years worth
of data points for a given metric) rather then 1 data point for 30,000
rows.

Any other tips to improve load time or reduce the load on the cluster
or subsequent compaction activity?   All my CF's I'll be writing to
use compression and leveled compaction.

Right now my Cassandra data store has about 4 months of data and we
have 5 years of historical (not sure yet how much we'll actually load
yet, but minimally 1 years worth).

Thanks!

-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
carpe diem quam minimum credula postero

Re: optimizing use of sstableloader / SSTableSimpleUnsortedWriter

Re: optimizing use of sstableloader / SSTableSimpleUnsortedWriter

Re: optimizing use of sstableloader / SSTableSimpleUnsortedWriter

optimizing use of sstableloader / SSTableSimpleUnsortedWriter

4 matches

Site Navigation

Mail list logo

Footer information