The questions you raise are important. I would add that for some scenarios, processing your data locally (not using Riak, but your own client program) could improve performance. In such a setup, each box would run both Riak and your own software.
The Dynamo paper discusses data locality, and points at two strategies: “(…) (1) route its request through a generic load balancer that will select a node based on load information, or (2) use a partition-aware client library that routes requests directly to the appropriate coordinator nodes. The advantage of the first approach is that the client does not have to link any code specific to Dynamo in its application, whereas the second strategy can achieve lower latency because it skips a potential forwarding step.” http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html So far, I have not seen any Riak client library using strategy (2). What I have seen is a lot of discussion about using (generic) load balancing (1). I am in the process of writing a client library myself, but the library only supports specifying an IP address / host name to contact. It would be helpful if a wiki page (under Best Practices) was created to discuss various load balance configurations. I am also wondering if a Riak client could use strategy (2), like Dynamo clients can. Kind regards Runar Jordahl http://blog.epigent.com/ _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
