Re: riak cluster suddenly became unresponsive

Mark Phillips Tue, 19 Mar 2013 07:05:18 -0700

Hi Ingo,

Sorry for the delay in getting back to you.


This looks symptomatic of some of the scheduler issues we fixed of 1.3. A
few of the    eleveldb issues in the release notes [1] can provide precise
details. Is upgrading a possibility?

Tweaking your zdbbl in vm.args should alleviate some of the issues with
busy buffers but upgrading is probably your best path here.

Hope that helps. Keep us posted.

Mark

[1] https://github.com/basho/riak/blob/master/RELEASE-NOTES.md

On Friday, March 15, 2013, Ingo Rockel wrote:

> Hi,
>
> we have a 12 nodes cluster running riak 1.2.1 which went live a week ago.
> Yesterday, suddenly from one minute to another the put_fsm_time_95 and the
> get_fsm_time_95 raised from something below 100ms up to several seconds.
> This went on for about 25 min and than went away.
>
> Checking the riak-logs of the nodes, I find a lot of these:
>
> 2013-03-14 17:48:06.388 [info] 
> <0.62.0>@riak_core_sysmon_**handler:handle_event:89
> Monitor got {suppressed,port_events,1}
> 2013-03-14 17:48:06.889 [info] 
> <0.62.0>@riak_core_sysmon_**handler:handle_event:85
> monitor busy_dist_port <0.7156.1> [{initial_call,{riak_core_**
> vnode,init,1}},{almost_**current_function,{erlang,bif_**
> return_trap,1}},{message_**queue_len,1}] {#Port<0.9083226>,'
> [email protected]'}
>
> This messages are logged all day, but only once every few minutes but in
> the problematic time frame between 17:45 and 18:17 it gets logged several
> times every second. The node ip differs though, but it seems only three
> nodes were involved.
>
> Except of these three nodes the cpu utilisation drops by half during this
> on all other nodes. On the three nodes there's only a slight drop.
>
> We are using leveldb as storage backend. I also checked some of the LOG
> files of leveldb and there are compactions logged, but these are logged all
> the day every few hours.
>
> In this time our software was quite unresponsive too so I would like to
> know what was causing this and what I might do to stop. Any ideas, hints?
>
> I found this:
>
> https://groups.google.com/**forum/?fromgroups=#!topic/**
> nosql-databases/GqbaeiKCSYE<https://groups.google.com/forum/?fromgroups=#!topic/nosql-databases/GqbaeiKCSYE>
>
> where Jon Meredith suggests to raise the buffer size to get rid of the
> busy buffers by adding +zdbbl 16384 to the vm.args file. Might this help?
>
> Regards,
>
>         Ingo
> --
> Software Architect
>
> Blue Lion mobile GmbH
> Tel. +49 (0) 221 788 797 14
> Fax. +49 (0) 221 788 797 19
> Mob. +49 (0) 176 24 87 30 89
>
> [email protected]
> >>> qeep: Hefferwolf
>
> www.bluelionmobile.com
> www.qeep.net
>
> ______________________________**_________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/**mailman/listinfo/riak-users_**lists.basho.com<http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>
>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: riak cluster suddenly became unresponsive

Reply via email to