I have actually described two recent situations here. In the first one,
there were no failed nodes or servers. I did a clear 'leave' from 1 node
and waited till it exited. I upgraded hardware on it and started the server
again (with clean riak setup), then I've done 'join'. I repeated this after
few hours on second server, doing clean 'leave'. Thats when I started
experiencing mailbox_overload errors.

In the second situation, which happened 2 days after, a hard drive have
failed on a 1 server so I had to do 'down' and 'force-remove' for this
node. It was ok when the server was marked as down. When I brought new
server back these errors started again. This time it lasted for 5 hours in
total making my Riak cluster and application unavailable. I did all sorts
of monitoring when I was able to do so (trying to keep application live)
and it seems that there was just the AAE exchanging (repairing) keys by
thousands. If I was disabling the AAE by making it passive I was able to
make my application work (with some limitations of read repair). As soon as
I switched AAE back to active - I was getting thousands of mailbox_overload
errors. I was trying to configure mailbox tiers throttle with no luck.

I'm running on a quite good hardware as I thought (64GB ram, RAID10, Intel
Xeon hex-core and private gigabit network just for Riak).



On 7 November 2014 06:06, Scott Lystig Fritchie <[email protected]>
wrote:

> Sargun Dhillon <[email protected]> wrote:
>
> sd> Can you run: [...]
>
> Hi, Sargun and Oleksiy.  Those commands and a lot more are run as part
> of the suite of info-gathering done by the "riak-debug" utility.  I
> recommend using it instead of managing a hodge-podge of separate
> commands.
>
> The output from "riak-admin cluster-info" is also exceptionally helpful,
> especially because it contains even more diagnostic information,
> especially about Erlang process mailbox contents.  I recommend running
> it during overload conditions to see what's going on internally.
>
> Also, "riak-admin top -sort msg_q" can give a real-time view of Erlang
> mailbox sizes, sorted by mailbox size.
>
> -Scott
>



-- 
Oleksiy Krivoshey
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to