Just got a new problem with Riak. Recently a hard drive has failed on one
of Riak nodes so I had to shut it down. I'm running 4 nodes now and each 10
minutes all of them start to fail with 'Error: {error,mailbox_overload}'
until restarted. Can anyone from Basho please suggest a solution/ fix for
this? My whole cluster is unusable with just 1 node failed.On 5 November 2014 00:11, Oleksiy Krivoshey <[email protected]> wrote: > There were also errors during initial handoff, here is a full console.log > for that day: https://www.dropbox.com/s/o7zop181pvpxoa5/console.log?dl=0 > > I actually replaced two nodes that day. First one went smoothly as it > should. The second one resulted in the situation above. I replaced the > first one and then the second after few hours. > > On 4 November 2014 20:44, Oleksiy Krivoshey <[email protected]> wrote: > >> Hi, >> >> I'm running a 5 node cluster (Riak 2.0.0) and I had to replace hardware >> on one of the servers. So I did a 'cluster leave', waited till the node >> exited, checked the ring status and members status, all was ok, with no >> pending changes. Then later after about 5 minutes every client connection >> to any of the 4 remaining nodes started to fail with >> >> [Error: {error,mailbox_overload} >> >> I have restarted one node after another and the error has gone. However I >> was still experiencing connectivity issues (timeouts) and riak error log is >> full of various errors even after I joined the 5th node back. >> >> Error are like: >> >> Failed to merge >> {["/var/lib/riak/bitcask_expire_1d/685078892498860742907977265335757665463718379520/1.bitcask.data"] >> >> gen_fsm <0.818.0> in state active terminated with reason: bad record >> state in riak_kv_vnode:set_vnode_forwarding/2 line 991 >> >> @riak_pipe_vnode:new_worker:826 Pipe worker startup failed: >> >> >> msg,7,[{file,"gen_fsm.erl"},{line,505}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}] >> 2014-11-04 16:07:57.124 [error] >> <0.11128.0>@riak_core_handoff_sender:start_fold:279 hinted_handoff transfer >> of riak_kv_vnode from '[email protected]' >> 353957427791078050502454920423474793822921162752 to 'riak@ >> 10.0.1.5' 353957427791078050502454920423474793822921162752 failed because >> of error:undef >> [{riak_core_format,human_size_fmt,["~.2f",588],[]},{riak_core_handoff_sender,start_fold,5,[{file,"src/riak_core_han >> doff_sender.erl"},{line,246}]}] >> >> The full error log file is available here: >> https://www.dropbox.com/s/3b8x3nqyego7lw3/error.log?dl=0 >> >> There was no significant load on Riak so I would like to understand what >> caused so many errors? >> >> -- >> Oleksiy >> > > > > -- > Oleksiy Krivoshey > -- Oleksiy Krivoshey
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
