Alain, 
I found out that the client node is an m1.small, and the cassandra nodes are 
m1.large. 

This is what is contained in each row: {dev1-dc1r-redir-0.unica.net/B9tk: 
{batchID: 2486272}}. Not a whole lot of data. 



If you don't use EBS, how is data persistence then maintained in the event that 
an instance goes down for whatever reason? 

Ken.... 
----- Original Message -----
From: "Alain RODRIGUEZ" <[email protected]> 
To: [email protected] 
Sent: Thursday, February 14, 2013 8:34:06 AM 
Subject: Re: Write performance expectations... 


Hi Ken, 


You really should take a look at my first answer... and give us more 
information on the size of your inserts, the type of EC2 you are using at 
least. You should also consider using Instance store and not EBS. Well, look at 
all these things I already told you. 


Alain 



2013/2/14 Peter Lin < [email protected] > 


it could be the instances are IO limited. 

I've been running benchmarks with Cassandra 1.1.9 the last 2 weeks on 
a AMD FX 8 core with 32GB of ram. 

with 24 threads I get roughly 20K inserts per second. each insert is 
only about 100-150 bytes. 



On Thu, Feb 14, 2013 at 8:07 AM, < [email protected] > wrote: 
> Using multithreading, inserting 2000 per thread, resulted in no throughput 
> increase. Each thread is taking about 4 seconds per, indicating a bottleneck 
> elsewhere. 
> 
> Ken.... 
> 
> ________________________________ 
> From: "Tyler Hobbs" < [email protected] > 
> To: [email protected] 
> Sent: Wednesday, February 13, 2013 11:06:30 AM 
> 
> Subject: Re: Write performance expectations... 
> 
> 2500 inserts per second is about what a single python thread using pycassa 
> can do against a local node. Are you using multiple threads for the 
> inserts? Multiple processes? 
> 
> 
> On Wed, Feb 13, 2013 at 8:21 AM, Alain RODRIGUEZ < [email protected] > 
> wrote: 
>> 
>> Is there a particular reason for you to use EBS ? Instance Store are 
>> recommended because they improve performances by reducing the I/O 
>> throttling. 
>> 
>> An other thing you should be aware of is that replicating the data to all 
>> node reduce your performance, it is more or less like if you had only one 
>> node (at performance level I mean). 
>> 
>> Also, writing to different datacenters probably induce some network 
>> latency. 
>> 
>> You should give the EC2 instance type (m1.xlarge / m1.large / ...) if you 
>> want some feedback about the 2500 w/s, and also give the mean size of your 
>> rows. 
>> 
>> Alain 
>> 
>> 
>> 2013/2/13 < [email protected] > 
>> 
>>> Hello, 
>>> New member here, and I have (yet another) question on write 
>>> performance. 
>>> 
>>> I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7. 
>>> 
>>> I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using 
>>> EBS and the RandomPartioner. I'm writing to a column family in a keyspace 
>>> that's replicated to all nodes in both datacenters, with a consistency 
>>> level 
>>> of LOCAL_QUORUM. 
>>> 
>>> I'm seeing write performance of around 2500 rows per second. 
>>> 
>>> Is this in the ballpark for this kind of configuration? 
>>> 
>>> Thanks in advance. 
>>> 
>>> Ken.... 
>>> 
>> 
> 
> 
> 
> -- 
> Tyler Hobbs 
> DataStax 



Reply via email to