Hey folks, I am interested in what others have seen in regards to their
experience in the amount of depth and width (CF, Rows & Columns) that they
can/do write per batch and simultaneously and what is the inflection point
where performance degrades.   I have been expanding my use of counters and
am finding some interesting nuances some in my code and implementation
related but others I can't yet quantify.

My batches are 1x5x5 (1 row for each of 5 column families and 5 columns for
each of those 1 rows within each of the 5 column families).  I have 3 nodes
each with 100 connections and another thread pool of 100 threads rolling
through 6,000,000 rows off data sending data out to Cassandra (the 1x5x5
matrice is constructed from each line).  I am finding this to be my sweet
spot right now but still not really performing fantastically (or at least
what I had hoped) and I am wondering what else (if anything) I can be doing
to tweak settings or what to be able to push in more columns or rows.   I
find changing my pool settings very much froms this causes error on client
lib but I will send email to that list separately though I think I have that
figured out on my own for now.

Thanks in advance!!!  I hope to get more work going on this in the next day
or so in a more methodic way to find the right count so I can build a sparse
matrice that will perform best for system and business.

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
*/

Reply via email to