Ah.. yes. Great benchmarks. If I’m interpreting them correctly it was ~15x slower for 22 columns vs 2 columns?
Guess we have to refactor again :-P Not the end of the world of course. On Sun, Aug 23, 2015 at 1:53 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: > A few months back, a user in #cassandra on freenode mentioned that when > they transitioned from thrift to cql, their overall performance decreased > significantly. They had 66 columns per table, so I ran some benchmarks with > various versions of Cassandra and thrift/cql combinations. > > It shouldn’t really surprise you that more columns = more work = slower > operations. It’s not necessarily the size of the writes, but the amount of > work that needs to be done with the extra cells (2 large columns totaling > 2k performs better than 66 small columns totaling 0.66k even though it’s > three times as much raw data being written to disk) > > https://gist.github.com/jeffjirsa/6e481b132334dfb6d42c > > 2.0.13, 2 tokens per node, 66 columns, 10 bytes per column, thrift (660 > bytes per): cassandra-stress --operation INSERT --num-keys 1000000 > --columns 66 --column-size=10 --replication-factor 2 --nodesfile=nodesAverages > from the middle 80% of values:interval_op_rate : 10720 > > 2.0.13, 2 tokens per node, 20 columns, 10 bytes per column, thrift (200 > bytes per):cassandra-stress --operation INSERT --num-keys 1000000 > --columns 20 --column-size=10 --replication-factor 2 --nodesfile=nodes > Averages > from the middle 80% of values:interval_op_rate : 28667 > > 2.0.13, 2 tokens per node, 2 large columns, thrift (2048 bytes > per):cassandra-stress > --operation INSERT --num-keys 1000000 --columns 2 --column-size=1024 > --replication-factor 2 --nodesfile=nodes Averages from the middle 80% of > values:interval_op_rate : 23489 > > From: <burtonator2...@gmail.com> on behalf of Kevin Burton > Reply-To: "user@cassandra.apache.org" > Date: Sunday, August 23, 2015 at 1:02 PM > To: "user@cassandra.apache.org" > Subject: Practical limitations of too many columns/cells ? > > Is there any advantage to using say 40 columns per row vs using 2 columns > (one for the pk and the other for data) and then shoving the data into a > BLOB as a JSON object? > > To date, we’ve been just adding new columns. I profiled Cassandra and > about 50% of the CPU time is spent on CPU doing compactions. Seeing that > CS is being CPU bottlenecked maybe this is a way I can optimize it. > > Any thoughts? > > -- > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > <https://plus.google.com/102718274791889610666/posts> > > -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts>