Hey folks, I am interested in what others have seen in regards to their experience in the amount of depth and width (CF, Rows & Columns) that they can/do write per batch and simultaneously and what is the inflection point where performance degrades. I have been expanding my use of counters and am finding some interesting nuances some in my code and implementation related but others I can't yet quantify.
My batches are 1x5x5 (1 row for each of 5 column families and 5 columns for each of those 1 rows within each of the 5 column families). I have 3 nodes each with 100 connections and another thread pool of 100 threads rolling through 6,000,000 rows off data sending data out to Cassandra (the 1x5x5 matrice is constructed from each line). I am finding this to be my sweet spot right now but still not really performing fantastically (or at least what I had hoped) and I am wondering what else (if anything) I can be doing to tweak settings or what to be able to push in more columns or rows. I find changing my pool settings very much froms this causes error on client lib but I will send email to that list separately though I think I have that figured out on my own for now. Thanks in advance!!! I hope to get more work going on this in the next day or so in a more methodic way to find the right count so I can build a sparse matrice that will perform best for system and business. /* Joe Stein http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> */