Riak and Amazon EC2

Dmitry Demeshchuk Fri, 25 Jun 2010 01:48:56 -0700

Greetings.

I tried running Riak with bitcask backend on 7 Amazon EC2 standard
large instances (7.5 GB RAM, 4 EC2 CPU units) and performed some
tests.
For comparison, I built up the following Riak clusters:


7 physical nodes ring
1 physical node ring (on one of the 7 instances, but I ran the tests
separately so the rings won't mess with each other)
1 physical node ring on an extra large instance (15 GB RAM, 8EC2 CPU units)

and ran a couple of tests with putting and getting data using Riak
native Erlang API (not PBC).

I had 2 buckets, the first one having small (averagely about 1KB)
values, but a lot of them (about several millions) called "entities",
and the second one having lists of keys from the first database,
called "documents". So, every document consists of a lot of entities
(I used 100 and 1000 for my tests). So, the approximate size of every
document was either 100KB or 1MB.

So, I performed tests of putting documents and entities to database
and then obtaining them. I tried to perform reads and writes using 10
and 100 concurrent Erlang processes (well, 100 was generally too much
as I ran out of CPU), first from only one machine and then from 2 and
3 machines at the same time (for the 7-nodes ring). Of course, the
entities were obtained using map-reduce.

The first weird thing was that even with 10 concurrent reads and
writes the performance didn't differ for all three clusters. Okay, 1
large and 1 extra large nodes don't differ so much but the 7 nodes
should have given me some performance, shouldn't they?

The second thing was that the average read time for one document with
1000 entities was about 5 seconds, and again, the number of machines
in the cluster didn't affect the result. I guess I just stumbled upon
the performance of the instance that sent all the map-reduce requests
and then collected the replies because when I ran tests on the other 2
instances, all three had the same performance.

The other strange thing was that during data writes most of the time
nodes were not io-loaded. If it was a one-stream write, it would be
obvious. But it were 10 and then 20 and 30 simultaneous writing
processes!


Unfortunately I cannot provide the detailed results now, they are
pretty messed up. I'm going to use basho_bench to make good graphs and
tables of these tests.

Any advises for the future tests or any explanations for such strange
performance?

Thank you in advance and sorry for a little messed up e-mail.

-- 
Best regards,
Dmitry Demeshchuk

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Riak and Amazon EC2

Reply via email to