A few months back, a user in #cassandra on freenode mentioned that when they 
transitioned from thrift to cql, their overall performance decreased 
significantly. They had 66 columns per table, so I ran some benchmarks with 
various versions of Cassandra and thrift/cql combinations.

It shouldn’t really surprise you that more columns = more work = slower 
operations. It’s not necessarily the size of the writes, but the amount of work 
that needs to be done with the extra cells (2 large columns totaling 2k 
performs better than 66 small columns totaling 0.66k even though it’s three 
times as much raw data being written to disk)

https://gist.github.com/jeffjirsa/6e481b132334dfb6d42c

2.0.13, 2 tokens per node, 66 columns, 10 bytes per column, thrift (660 bytes 
per):
cassandra-stress --operation INSERT --num-keys 1000000  --columns 66 
--column-size=10   --replication-factor 2  --nodesfile=nodes
Averages from the middle 80% of values:
interval_op_rate          : 10720


2.0.13, 2 tokens per node, 20 columns, 10 bytes per column, thrift (200 bytes 
per):
cassandra-stress --operation INSERT --num-keys 1000000  --columns 20 
--column-size=10   --replication-factor 2  --nodesfile=nodes
Averages from the middle 80% of values:
interval_op_rate          : 28667


2.0.13, 2 tokens per node, 2  large columns, thrift (2048 bytes per):
cassandra-stress --operation INSERT --num-keys 1000000  --columns 2 
--column-size=1024   --replication-factor 2  --nodesfile=nodes
Averages from the middle 80% of values:
interval_op_rate          : 23489


From:  <burtonator2...@gmail.com> on behalf of Kevin Burton
Reply-To:  "user@cassandra.apache.org"
Date:  Sunday, August 23, 2015 at 1:02 PM
To:  "user@cassandra.apache.org"
Subject:  Practical limitations of too many columns/cells ?

Is there any advantage to using say 40 columns per row vs using 2 columns (one 
for the pk and the other for data) and then shoving the data into a BLOB as a 
JSON object? 

To date, we’ve been just adding new columns.  I profiled Cassandra and about 
50% of the CPU time is spent on CPU doing compactions.  Seeing that CS is being 
CPU bottlenecked maybe this is a way I can optimize it.

Any thoughts?

-- 
Founder/CEO Spinn3r.com
Location: San Francisco, CA
blog: http://burtonator.wordpress.com
… or check out my Google+ profile


Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to