Hi, Dmitry. There are some gaps in the information you included here that might help clarify what's going on so I'm going to just rattle off some questions for clarification.
Is your test driver only making requests of a single EC2 instance? Or are you querying all 7 nodes directly in so sort of load distribution? If you aren't querying all 7 nodes directly, then you will likely see performance on par with a cluster with only a single "physical" node. Are you certain that the 7 nodes are communicating with each other? The output of the "riak-admin status" command should list the nodes in the "ring_members" field. Are the "documents" a separate key with Riak's built in links to the "entities" or are they keys with a data blob that refer to the entities?[1] If the latter, have you read http://blog.basho.com/2010/02/24/link-walking-by-example/ ? It's also important for me to note that EC2 instances do not necessarily have the same characteristics of actual physical hardware when it comes to preventing resource contention. Since EC2 instances are virtualized, you have no idea what other load the physical host of a given instance may be under. As a result it is possible to have a Riak instance running on the same hardware as another IO and CPU intensive instance without your knowledge, impeding each other to a certain degree. We've had a number of users complain of performance problems with Riak clusters running on EC2 at various times. From my personal and anecdotal experience, EC2 seems to be pretty heavily oversubscribed much of the time which leads to intermittent performance issues for all kinds of applications. All of that is just a long winded way of saying: don't expect shared virtualized resources to provide the same performance as dedicated physical hardware. But you should still see at least somewhat better performance that you're seeing now if your testing harness is testing properly. --Ryan 1. I'm not certain if you're saying that the documents are stored in a separate bucket from the entities in the same Riak cluster or a separate Riak cluster entirely. On Fri, Jun 25, 2010 at 12:02 AM, Dmitry Demeshchuk <[email protected]>wrote: > Greetings. > > I tried running Riak with bitcask backend on 7 Amazon EC2 standard > large instances (7.5 GB RAM, 4 EC2 CPU units) and performed some > tests. > For comparison, I built up the following Riak clusters: > > 7 physical nodes ring > 1 physical node ring (on one of the 7 instances, but I ran the tests > separately so the rings won't mess with each other) > 1 physical node ring on an extra large instance (15 GB RAM, 8EC2 CPU units) > > and ran a couple of tests with putting and getting data using Riak > native Erlang API (not PBC). > > I had 2 buckets, the first one having small (averagely about 1KB) > values, but a lot of them (about several millions) called "entities", > and the second one having lists of keys from the first database, > called "documents". So, every document consists of a lot of entities > (I used 100 and 1000 for my tests). So, the approximate size of every > document was either 100KB or 1MB. > > So, I performed tests of putting documents and entities to database > and then obtaining them. I tried to perform reads and writes using 10 > and 100 concurrent Erlang processes (well, 100 was generally too much > as I ran out of CPU), first from only one machine and then from 2 and > 3 machines at the same time (for the 7-nodes ring). Of course, the > entities were obtained using map-reduce. > > The first weird thing was that even with 10 concurrent reads and > writes the performance didn't differ for all three clusters. Okay, 1 > large and 1 extra large nodes don't differ so much but the 7 nodes > should have given me some performance, shouldn't they? > > The second thing was that the average read time for one document with > 1000 entities was about 5 seconds, and again, the number of machines > in the cluster didn't affect the result. I guess I just stumbled upon > the performance of the instance that sent all the map-reduce requests > and then collected the replies because when I ran tests on the other 2 > instances, all three had the same performance. > > The other strange thing was that during data writes most of the time > nodes were not io-loaded. If it was a one-stream write, it would be > obvious. But it were 10 and then 20 and 30 simultaneous writing > processes! > > > Unfortunately I cannot provide the detailed results now, they are > pretty messed up. I'm going to use basho_bench to make good graphs and > tables of these tests. > > Any advises for the future tests or any explanations for such strange > performance? > > Thank you in advance and sorry for a little messed up e-mail. > > -- > Best regards, > Dmitry Demeshchuk > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
