I was inserting the contents of wikipedia, so the columns were at multi
kilobyte strings. It's a good data source to run tests with as the records and
relationships are somewhat varied in size.
My main point was to say the best way to benchmark cassandra with with multiple
server nodes,
Since each row in my column family has 30 columns, wouldn't this translate
to ~8,000 rows per second...or am I misunderstanding something.
Talking in terms of columns, my load test would seem to perform as follows:
100,000 rows / 26 sec * 30 columns/row = 115K columns per second.
That's on a
Use more nodes to increase your write throughput. Testing on a single
machine is not really a viable benchmark for what you can achieve with
cassandra.
You don't give many details, but I would guess:
- your benchmark is not multithreaded
- mongodb is not configured for durable writes, so you're really only
measuring the time for it to buffer it in memory
- you haven't loaded enough data to hit mongo's index doesn't fit in
memory anymore
On Tue,
To give an idea, last March (2010) I run the a much older Cassandra on 10 HP
blades (dual socket, 4 core, 16GB, 2.5 laptop HDD) and was writing around 250K
columns per second with 500 python processes loading the data from wikipedia
running on another 10 HP blades.
This was my first out of