I had rolled out an upgrade to a JVM app that uses rjc 1.0.5. We had
upgraded to 1.0.6 to take advantage of newly added abilities to do a
put without preceding it with a fetch in order to reduce operational
load on the cluster. However, after rolling out this change we
frequently see large rises in latency across the cluster (up to the
gen_fsm limit of 60s) and see the following in the riak logs
[error] Unrecognized message {74392380,{error,timeout}}
This is accompanied by repeated socket timeouts as seen by the riak-java-client.
Also worth mentioning, one of our nodes got into a state that the rjc
was unable to establish a tcp connection on the protobuf port to riak
on localhost. We were only able to fix this by restarting the riak
process on that node and inducing a fair amount of handoff.
Any thoughts?
Thanks,
D
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com