On Wed, Apr 18, 2012 at 5:00 PM, Dan Feldman <hriunde...@gmail.com> wrote:
> Hi all,
>
> I'm trying to optimize moving data from Cassandra to HDFS using either Ruby
> or Python client. Right now, I'm playing around on my staging server, an 8
> GB single node machine. My data in Cassandra (1.0.8) consist of 2 rows (for
> now) with ~150k super columns each (I know, I know - super columns are bad).
> Every super column has ~25 columns totaling ~800 bytes per super column.
>
> I should also mention that currently the database is static - there are no
> writes/updates, only reads.
>
> Anyways, in my python/ruby scripts, I'm taking slices of 5000 supercolumns
> long from a single row.  It takes 13 seconds with ruby and 8 seconds with
> pycassa to get a single slice. Or, in other words, it's currently reading at
> speeds of less than 500 kB per second. The speed seems to be linear with the
> length of a slice (i.e. 6 seconds for 2500 scs for ruby). If I run nodetool
> cfstats while my script is running, it tells me that my read latency on the
> column family is ~300ms.
>
> I assume that this is not normal and thus was wondering what parameters I
> could tweak to improve the performance.
>

Is your client mult-threaded?  The single threaded performance of
Cassandra isn't at all impressive and it really is designed for
dealing with a lot of simultaneous requests.


-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"

Reply via email to