That comment is no longer correct since riak now (since 1.0 I believe) ignores client IDs. See https://github.com/basho/riak/blob/master/releasenotes/riak-1.0.org#getput-improvements
1. Sibling will only occur if you have allow_mult=true enabled on the bucket. The following applies only in that case. 2. The way ordering is determined inside riak is by using the vector clock coming with the write. The only way to get one is reading it from riak. Thus, to do ordered writes you have to first read an object, modify it, then pass it back in. If the object was written by someone else in the mean time you'll get a sibling. 3. Passing in no vclock (a fresh riak_object) now also creates a sibling if there is an existing object for the given key. Kresten Trifork On 03/09/2012, at 21.33, "Bryan Hughes" <[email protected]<mailto:[email protected]>> wrote: Heh - I found my own answer staring me in the face - http://wiki.basho.com/Vector-Clocks.html. Concurrent writes If two writes occur simultaneously from clients with different client IDs but the same vector clock value, Riak will not be able to determine the correct object to store and the object is given two siblings. These writes could happen to the same node or different ones. Cheers, Bryan On 9/3/12 10:47 AM, Bryan Hughes wrote: Thanks for the replies - this is very helpful. Our persistence abstraction layer already sports a robust process pool as we do support other persistence solutions (although Riak is our main gun). I just needed to understand the relationship of the protobuffer client process to the Riak cluster as a whole. I understand now that the client process binds to an individual node in the cluster and not the cluster as a whole. I wasnt sure there might be some logic somewhere that handled a type of proxy (like Joe was referring to) so that each client connects to a single address and that proxy implements the necessary routing. Fortunately, I just need to add some round robin and affinity and load balancing management to our persistence layer. From what I have been reading (including basic-client.txt in the riak/doc), the key is to ensure the same client binds to the same connection against the same node for subsequent writes? For example, if I have 1000 gen_server processes each reading and writing atomic values to the cluster, and a process uses connection X to node A for a write of record 100, the next write of record 100 should be on the same connection to the same node unless that node goes away. If I am understanding this correctly, for process A writing record 1 to grab a random connection to a random node and then writing record 2 on a different connection to either the same node or different node will result in nothing but siblings? Thanks again! Cheers, Bryan On 9/2/12 11:39 PM, Mark Phillips wrote: Hi Brysn , There have been at least four chunks of code released to handle connection pooling ( in addition to poolboy); http://wiki.basho.com/Community-Developed-Libraries-and-Projects.html#Client-Libraries-and-Frameworks. ( Scroll down to " Erlang".) These might be worth a look. Mark twitter.com/pharkmillups<http://twitter.com/pharkmillups> Mark On Sep 3, 2012, at 6:40, Joseph Lambert <[email protected]<mailto:[email protected]>> wrote: Hi Bryan, AFAIK, there is no built-in connection pooling for the Riak Erlang client. Each connection will only connect with one node and only that node, but since it's masterless you can connect to any node. You could roll your own connection pooling mechanism, or use something like Poolboy to handle it for you. Using Poolboy is convenient because it comes as a dependency of riak_core. If you use Poolboy, you'll have to modify riakc_pb_socket slightly to account for the way poolboy initializes connections (add a start_link/1), or create a simple module to pass the initialization from poolboy to riakc_pb_socket. - Joe Lambert On Mon, Sep 3, 2012 at 11:41 AM, Bryan Hughes <[email protected]<mailto:[email protected]>> wrote: Hi Guys, I have a question regarding Riak's protobuffer client gen_server process. I have a cluster of 5 nodes (machines), each with consecutive IP addresses. Our application is 100% erlang and runs on its own machine. The arguments to riakc_pb_socket:start_link/2 is an Address, Port and the optional Options. The Address and Port is the address of the riak server, but in the case of a masterless cluster of 5 machines, which address do I use? In reviewing the code for riakc_pb_socket.erl, the client opens a socket via gen_tcp to that particular node in the cluster and only that node. This means that there is a 1 to 1 connection between the riak node and the client. Is this correct? Maybe I am missing something? If so, then it looks like I need to implement my own round-robin algorithm across a pool of protobuffer clients that I am managing, each bound to a different node in the cluster while testing "aliveness" with ping/2 and an immediate timeout? Cheers, Bryan -- Bryan Hughes Wobblesoft http://www.wobblesoft.com "Art is never finished, only abandoned. - Leonardo da Vinci" _______________________________________________ riak-users mailing list [email protected]<mailto:[email protected]> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list [email protected]<mailto:[email protected]> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list [email protected]<mailto:[email protected]> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com -- Bryan Hughes Wobblesoft http://www.wobblesoft.com "Art is never finished, only abandoned. - Leonardo da Vinci" _______________________________________________ riak-users mailing list [email protected]<mailto:[email protected]> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
