The Java ClusterClient uses a very simple round-robin and downed nodes are not removed from the rotation. In addition, it uses a global index that gets incremented on every operation/retry from the client threads.
Especially with 3 nodes and any amount of load it is likely a single thread will end up retrying the same downed node and then fail the operation. The best solution currently is to use HAProxy or another load balancer. This is something we aim to improve as we further develop the Java client. Thanks! Brian Roach On Tue, Oct 16, 2012 at 2:31 AM, Philippe Guillebert <[email protected]> wrote: > Hi list, > > We have a cluster of three Riak 0.14.2 nodes in production and quite happy > with it. I'm planning the upgrade to 1.2.0 and while testing it, I wondered > about how a client should behave during a rolling upgrade (1 node is down > for maintenance but the cluster is working). > > My expectations for a client is, if a given node is down the client will try > on another node of the cluster to "hide" the maintenance to the upper layers > of my application. > > I tried with Clojure client Welle (internally it uses a PBClusterClient) and > it didn't work. As soon as I stop a Riak node, the client throws Connection > Refused exceptions (instead of retrying elsewhere). > Our Java client library (uses PBClusterClient) has the same problem. > > So I realized here that if I restart a node (for maintenance) on my live > cluster, my app breaks ?!? > > I tried googling but there is a lot of contradictory opinions out there : > > On the wiki > https://github.com/basho/riak-java-client/wiki/ClientFactory#wiki-example3 > it says I should use another class of client : > > IRiakClient myPbClient = RiakFactory.newClient(myPbClusterConfig); > > Will this client retry correctly ? Does this mean the Welle developers used > the "wrong" client ? > > > This message on the list states that PBClusterClient should work as I expect > : > http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-March/007949.html > but this message states that ClusterClient is not working as expected : > http://comments.gmane.org/gmane.comp.db.riak.user/8680 > > > Can you help me keep my sanity here ? Thank you ! > > > > -- > > Philippe > > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
