Ah.. yes.  Great benchmarks. If I’m interpreting them correctly it was ~15x
slower for 22 columns vs 2 columns?

Guess we have to refactor again :-P

Not the end of the world of course.

On Sun, Aug 23, 2015 at 1:53 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

> A few months back, a user in #cassandra on freenode mentioned that when
> they transitioned from thrift to cql, their overall performance decreased
> significantly. They had 66 columns per table, so I ran some benchmarks with
> various versions of Cassandra and thrift/cql combinations.
>
> It shouldn’t really surprise you that more columns = more work = slower
> operations. It’s not necessarily the size of the writes, but the amount of
> work that needs to be done with the extra cells (2 large columns totaling
> 2k performs better than 66 small columns totaling 0.66k even though it’s
> three times as much raw data being written to disk)
>
> https://gist.github.com/jeffjirsa/6e481b132334dfb6d42c
>
> 2.0.13, 2 tokens per node, 66 columns, 10 bytes per column, thrift (660
> bytes per): cassandra-stress --operation INSERT --num-keys 1000000
> --columns 66 --column-size=10 --replication-factor 2 --nodesfile=nodesAverages
> from the middle 80% of values:interval_op_rate : 10720
>
> 2.0.13, 2 tokens per node, 20 columns, 10 bytes per column, thrift (200
> bytes per):cassandra-stress --operation INSERT --num-keys 1000000
> --columns 20 --column-size=10 --replication-factor 2 --nodesfile=nodes 
> Averages
> from the middle 80% of values:interval_op_rate : 28667
>
> 2.0.13, 2 tokens per node, 2 large columns, thrift (2048 bytes 
> per):cassandra-stress
> --operation INSERT --num-keys 1000000 --columns 2 --column-size=1024
> --replication-factor 2 --nodesfile=nodes Averages from the middle 80% of
> values:interval_op_rate : 23489
>
> From: <burtonator2...@gmail.com> on behalf of Kevin Burton
> Reply-To: "user@cassandra.apache.org"
> Date: Sunday, August 23, 2015 at 1:02 PM
> To: "user@cassandra.apache.org"
> Subject: Practical limitations of too many columns/cells ?
>
> Is there any advantage to using say 40 columns per row vs using 2 columns
> (one for the pk and the other for data) and then shoving the data into a
> BLOB as a JSON object?
>
> To date, we’ve been just adding new columns.  I profiled Cassandra and
> about 50% of the CPU time is spent on CPU doing compactions.  Seeing that
> CS is being CPU bottlenecked maybe this is a way I can optimize it.
>
> Any thoughts?
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>

Reply via email to