I have actually described two recent situations here. In the first one, there were no failed nodes or servers. I did a clear 'leave' from 1 node and waited till it exited. I upgraded hardware on it and started the server again (with clean riak setup), then I've done 'join'. I repeated this after few hours on second server, doing clean 'leave'. Thats when I started experiencing mailbox_overload errors.
In the second situation, which happened 2 days after, a hard drive have failed on a 1 server so I had to do 'down' and 'force-remove' for this node. It was ok when the server was marked as down. When I brought new server back these errors started again. This time it lasted for 5 hours in total making my Riak cluster and application unavailable. I did all sorts of monitoring when I was able to do so (trying to keep application live) and it seems that there was just the AAE exchanging (repairing) keys by thousands. If I was disabling the AAE by making it passive I was able to make my application work (with some limitations of read repair). As soon as I switched AAE back to active - I was getting thousands of mailbox_overload errors. I was trying to configure mailbox tiers throttle with no luck. I'm running on a quite good hardware as I thought (64GB ram, RAID10, Intel Xeon hex-core and private gigabit network just for Riak). On 7 November 2014 06:06, Scott Lystig Fritchie <[email protected]> wrote: > Sargun Dhillon <[email protected]> wrote: > > sd> Can you run: [...] > > Hi, Sargun and Oleksiy. Those commands and a lot more are run as part > of the suite of info-gathering done by the "riak-debug" utility. I > recommend using it instead of managing a hodge-podge of separate > commands. > > The output from "riak-admin cluster-info" is also exceptionally helpful, > especially because it contains even more diagnostic information, > especially about Erlang process mailbox contents. I recommend running > it during overload conditions to see what's going on internally. > > Also, "riak-admin top -sort msg_q" can give a real-time view of Erlang > mailbox sizes, sorted by mailbox size. > > -Scott > -- Oleksiy Krivoshey
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
