Re: riakc_pb_socket:start_link question with a riak cluster

Bryan Hughes Mon, 03 Sep 2012 17:46:57 -0700

Thanks Kresten! The challenges of fast development with aging docs andtrying to keep up with everything. This indeed is very different andnow makes a lot more sense and is back inline with our originalassumptions (which apparently were wrong initially back with pre 1.0 andnow are correct).

Just one more clarification - so if I have a 5 node cluster, and a poolof protobuffer client processes which have been balanced in a roundrobin manner across all nodes, does it matter if I have a workergen_server process <0.100.0> which performs write 1 to node A and thenwrite 2 to node B (not concurrently) using two different protobufferclient processes (<0.101.0> and <0.102.0> respectively)? Are there anycaveats to this approach?


Also, we are on 1.2.

Cheers,
Bryan


On 9/3/12 1:37 PM, Kresten Krab Thorup wrote:

That comment is no longer correct since riak now (since 1.0 I believe) ignores 
client IDs. See 
https://github.com/basho/riak/blob/master/releasenotes/riak-1.0.org#getput-improvements

1. Sibling will only occur if you have allow_mult=true enabled on the bucket. 
The following applies only in that case.

2. The way ordering is determined inside riak is by using the vector clock 
coming with the write. The only way to get one is reading it from riak. Thus, 
to do ordered writes you have to first read an object, modify it, then pass it 
back in. If the object was written by someone else in the mean time you'll get 
a sibling.

3. Passing in no vclock (a fresh riak_object) now also creates a sibling if 
there is an existing object for the given key.

Kresten
Trifork

On 03/09/2012, at 21.33, "Bryan Hughes" 
<[email protected]<mailto:[email protected]>> wrote:

Heh - I found my own answer staring me in the face - 
http://wiki.basho.com/Vector-Clocks.html.

Concurrent writes If two writes occur simultaneously from clients with 
different client IDs but the same vector clock value, Riak will not be able to 
determine the correct object to store and the object is given two siblings. 
These writes could happen to the same node or different ones.

Cheers,
Bryan

On 9/3/12 10:47 AM, Bryan Hughes wrote:
Thanks for the replies - this is very helpful.  Our persistence abstraction 
layer already sports a robust process pool as we do support other persistence 
solutions (although Riak is our main gun).   I just needed to understand the 
relationship of the protobuffer client process to the Riak cluster as a whole.  
I understand now that the client process binds to an individual node in the 
cluster and not the cluster as a whole.

I wasnt sure there might be some logic somewhere that handled a type of proxy 
(like Joe was referring to) so that each client connects to a single address 
and that proxy implements the necessary routing.

Fortunately, I just need to add some round robin and affinity and load 
balancing management to our persistence layer.  From what I have been reading 
(including basic-client.txt in the riak/doc), the key is to ensure the same 
client binds to the same connection against the same node for subsequent writes?

For example, if I have 1000 gen_server processes each reading and writing 
atomic values to the cluster, and a process uses connection X to node A for a 
write of record 100, the next write of record 100 should be on the same 
connection to the same node unless that node goes away.

If I am understanding this correctly, for process A writing record 1 to grab a 
random connection to a random node and then writing record 2 on a different 
connection to either the same node or different node will result in nothing but 
siblings?

Thanks again!

Cheers,
Bryan


On 9/2/12 11:39 PM, Mark Phillips wrote:
Hi Brysn ,

There have been at least four chunks of code released to handle connection 
pooling ( in addition to poolboy);

http://wiki.basho.com/Community-Developed-Libraries-and-Projects.html#Client-Libraries-and-Frameworks.
  ( Scroll down to " Erlang".)

These might be worth a look.

Mark
twitter.com/pharkmillups<http://twitter.com/pharkmillups>



Mark


On Sep 3, 2012, at 6:40, Joseph Lambert 
<[email protected]<mailto:[email protected]>> wrote:

Hi Bryan,

AFAIK, there is no built-in connection pooling for the Riak Erlang client. Each 
connection will only connect with one node and only that node, but since it's 
masterless you can connect to any node. You could roll your own connection 
pooling mechanism, or use something like Poolboy to handle it for you. Using 
Poolboy is convenient because it comes as a dependency of riak_core.

If you use Poolboy, you'll have to modify riakc_pb_socket slightly to account 
for the way poolboy initializes connections (add a start_link/1), or create a 
simple module to pass the initialization from poolboy to riakc_pb_socket.

- Joe Lambert


On Mon, Sep 3, 2012 at 11:41 AM, Bryan Hughes 
<[email protected]<mailto:[email protected]>> wrote:
Hi Guys,

I have a question regarding Riak's protobuffer client gen_server process.  I 
have a cluster of 5 nodes (machines), each with consecutive IP addresses.  Our 
application is 100% erlang and runs on its own machine.  The arguments to 
riakc_pb_socket:start_link/2 is an Address, Port and the optional Options.  The 
Address and Port is the address of the riak server, but in the case of a 
masterless cluster of 5 machines, which address do I use?

In reviewing the code for riakc_pb_socket.erl, the client opens a socket via 
gen_tcp to that particular node in the cluster and only that node.  This means 
that there is a 1 to 1 connection between the riak node and the client.  Is 
this correct?  Maybe I am missing something?

If so, then it looks like I need to implement my own round-robin algorithm across a pool 
of protobuffer clients that I am managing, each bound to a different node in the cluster 
while testing "aliveness" with ping/2 and an immediate timeout?

Cheers,
Bryan

--

Bryan Hughes
Wobblesoft

http://www.wobblesoft.com

"Art is never finished, only abandoned. - Leonardo da Vinci"


_______________________________________________
riak-users mailing list
[email protected]<mailto:[email protected]>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[email protected]<mailto:[email protected]>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




_______________________________________________
riak-users mailing list
[email protected]<mailto:[email protected]>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


--

Bryan Hughes
Wobblesoft

http://www.wobblesoft.com

"Art is never finished, only abandoned. - Leonardo da Vinci"


_______________________________________________
riak-users mailing list
[email protected]<mailto:[email protected]>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


--

Bryan Hughes
*Wobblesoft*

http://www.wobblesoft.com

/"Art is never finished, only abandoned. - Leonardo da Vinci"/

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: riakc_pb_socket:start_link question with a riak cluster

Reply via email to