Re: Write performance help needed

2011-05-05 Thread aaron morton
I was inserting the contents of wikipedia, so the columns were at multi 
kilobyte strings. It's a good data source to run tests with as the records and 
relationships are somewhat varied in size.

My main point was to say the best way to benchmark cassandra with with multiple 
server nodes, multiple client threads /processes, the level of redundancy and 
consistency you want to run at in production, and if you can some sort of 
approximation of the data size. A single cassandra instance may well lose 
against  single RDBMS instance in a straight out race (thought as jonathan 
points out mongo is not playing fair). But you generally would not deploy a 
single cassandra node.

If you can provide some more details on your test we may be able to help:
- what is the target application
- the cassandra schema and any configuration changes
- the java code you used

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 5 May 2011, at 02:01, Steve Smith wrote:

 Since each row in my column family has 30 columns, wouldn't this translate to 
 ~8,000 rows per second...or am I misunderstanding something.
 
 Talking in terms of columns, my load test would seem to perform as follows:
 
 100,000 rows / 26 sec * 30 columns/row = 115K columns per second.
 
 That's on a dual core, 2.66 GHz laptop, 4GB RAM...single running cassandra 
 nodehector (java) client.
 
 Am I interpreting things correctly?
 
 - Steve
 
 
 On Tue, May 3, 2011 at 3:59 PM, aaron morton aa...@thelastpickle.com wrote:
 To give an idea, last March (2010) I run the a much older Cassandra on 10 HP 
 blades (dual socket, 4 core, 16GB, 2.5 laptop HDD) and was writing around 
 250K columns per second with 500 python processes loading the data from 
 wikipedia running on another 10 HP blades.
 
 This was my first out of the box no tuning (other then using sensible batch 
 updates) test. Since then Cassandra has gotten much faster.
 
 Hope that helps
 Aaron
 
 On 4 May 2011, at 02:22, Jonathan Ellis wrote:
 
  You don't give many details, but I would guess:
 
  - your benchmark is not multithreaded
  - mongodb is not configured for durable writes, so you're really only
  measuring the time for it to buffer it in memory
  - you haven't loaded enough data to hit mongo's index doesn't fit in
  memory anymore
 
  On Tue, May 3, 2011 at 8:24 AM, Steve Smith stevenpsmith...@gmail.com 
  wrote:
  I am working for client that needs to persist 100K-200K records per second
  for later querying.  As a proof of concept, we are looking at several
  options including nosql (Cassandra and MongoDB).
  I have been running some tests on my laptop (MacBook Pro, 4GB RAM, 2.66 
  GHz,
  Dual Core/4 logical cores) and have not been happy with the results.
  The best I have been able to accomplish is 100K records in approximately 30
  seconds.  Each record has 30 columns, mostly made up of integers.  I have
  tried both the Hector and Pelops APIs, and have tried writing in batches
  versus one at a time.  The times have not varied much.
  I am using the out of the box configuration for Cassandra, and while I know
  using 1 disk will have an impact on performance, I would expect to see
  better write numbers than I am.
  As a point of reference, the same test using MongoDB I was able to
  accomplish 100K records in 3.5 seconds.
  Any tips would be appreciated.
 
  - Steve
 
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 
 



Re: Write performance help needed

2011-05-04 Thread Steve Smith
Since each row in my column family has 30 columns, wouldn't this translate
to ~8,000 rows per second...or am I misunderstanding something.

Talking in terms of columns, my load test would seem to perform as follows:

100,000 rows / 26 sec * 30 columns/row = 115K columns per second.

That's on a dual core, 2.66 GHz laptop, 4GB RAM...single running cassandra
nodehector (java) client.

Am I interpreting things correctly?

- Steve


On Tue, May 3, 2011 at 3:59 PM, aaron morton aa...@thelastpickle.comwrote:

 To give an idea, last March (2010) I run the a much older Cassandra on 10
 HP blades (dual socket, 4 core, 16GB, 2.5 laptop HDD) and was writing around
 250K columns per second with 500 python processes loading the data from
 wikipedia running on another 10 HP blades.

 This was my first out of the box no tuning (other then using sensible batch
 updates) test. Since then Cassandra has gotten much faster.

 Hope that helps
 Aaron

 On 4 May 2011, at 02:22, Jonathan Ellis wrote:

  You don't give many details, but I would guess:
 
  - your benchmark is not multithreaded
  - mongodb is not configured for durable writes, so you're really only
  measuring the time for it to buffer it in memory
  - you haven't loaded enough data to hit mongo's index doesn't fit in
  memory anymore
 
  On Tue, May 3, 2011 at 8:24 AM, Steve Smith stevenpsmith...@gmail.com
 wrote:
  I am working for client that needs to persist 100K-200K records per
 second
  for later querying.  As a proof of concept, we are looking at several
  options including nosql (Cassandra and MongoDB).
  I have been running some tests on my laptop (MacBook Pro, 4GB RAM, 2.66
 GHz,
  Dual Core/4 logical cores) and have not been happy with the results.
  The best I have been able to accomplish is 100K records in approximately
 30
  seconds.  Each record has 30 columns, mostly made up of integers.  I
 have
  tried both the Hector and Pelops APIs, and have tried writing in batches
  versus one at a time.  The times have not varied much.
  I am using the out of the box configuration for Cassandra, and while I
 know
  using 1 disk will have an impact on performance, I would expect to see
  better write numbers than I am.
  As a point of reference, the same test using MongoDB I was able to
  accomplish 100K records in 3.5 seconds.
  Any tips would be appreciated.
 
  - Steve
 
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com




Re: Write performance help needed

2011-05-03 Thread Eric tamme
Use more nodes to increase your write throughput.  Testing on a single
machine is not really a viable benchmark for what you can achieve with
cassandra.


Re: Write performance help needed

2011-05-03 Thread Jonathan Ellis
You don't give many details, but I would guess:

- your benchmark is not multithreaded
- mongodb is not configured for durable writes, so you're really only
measuring the time for it to buffer it in memory
- you haven't loaded enough data to hit mongo's index doesn't fit in
memory anymore

On Tue, May 3, 2011 at 8:24 AM, Steve Smith stevenpsmith...@gmail.com wrote:
 I am working for client that needs to persist 100K-200K records per second
 for later querying.  As a proof of concept, we are looking at several
 options including nosql (Cassandra and MongoDB).
 I have been running some tests on my laptop (MacBook Pro, 4GB RAM, 2.66 GHz,
 Dual Core/4 logical cores) and have not been happy with the results.
 The best I have been able to accomplish is 100K records in approximately 30
 seconds.  Each record has 30 columns, mostly made up of integers.  I have
 tried both the Hector and Pelops APIs, and have tried writing in batches
 versus one at a time.  The times have not varied much.
 I am using the out of the box configuration for Cassandra, and while I know
 using 1 disk will have an impact on performance, I would expect to see
 better write numbers than I am.
 As a point of reference, the same test using MongoDB I was able to
 accomplish 100K records in 3.5 seconds.
 Any tips would be appreciated.

 - Steve




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Write performance help needed

2011-05-03 Thread aaron morton
To give an idea, last March (2010) I run the a much older Cassandra on 10 HP 
blades (dual socket, 4 core, 16GB, 2.5 laptop HDD) and was writing around 250K 
columns per second with 500 python processes loading the data from wikipedia 
running on another 10 HP blades. 

This was my first out of the box no tuning (other then using sensible batch 
updates) test. Since then Cassandra has gotten much faster.
  
Hope that helps
Aaron

On 4 May 2011, at 02:22, Jonathan Ellis wrote:

 You don't give many details, but I would guess:
 
 - your benchmark is not multithreaded
 - mongodb is not configured for durable writes, so you're really only
 measuring the time for it to buffer it in memory
 - you haven't loaded enough data to hit mongo's index doesn't fit in
 memory anymore
 
 On Tue, May 3, 2011 at 8:24 AM, Steve Smith stevenpsmith...@gmail.com wrote:
 I am working for client that needs to persist 100K-200K records per second
 for later querying.  As a proof of concept, we are looking at several
 options including nosql (Cassandra and MongoDB).
 I have been running some tests on my laptop (MacBook Pro, 4GB RAM, 2.66 GHz,
 Dual Core/4 logical cores) and have not been happy with the results.
 The best I have been able to accomplish is 100K records in approximately 30
 seconds.  Each record has 30 columns, mostly made up of integers.  I have
 tried both the Hector and Pelops APIs, and have tried writing in batches
 versus one at a time.  The times have not varied much.
 I am using the out of the box configuration for Cassandra, and while I know
 using 1 disk will have an impact on performance, I would expect to see
 better write numbers than I am.
 As a point of reference, the same test using MongoDB I was able to
 accomplish 100K records in 3.5 seconds.
 Any tips would be appreciated.
 
 - Steve
 
 
 
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com