Re: No more disk space on a node of my Riak cluster

Evan Vigil-McClanahan Thu, 21 Mar 2013 12:35:26 -0700

busy_dist_ports usually means that you're trying to push too much data
over your connections, but there are also other things that can
trigger it.


you can tune the amount of buffer that distributed erlang provides by adding
+zdbbl <some_number>

where <some_number> is the buffer size *in KB* *per other node in the
cluster*.  the default is 1024 (i.e. one megabyte), but raising it to
8 or 16MB is often called for in busy clusters.  If that doesn't help,
you may be affected by one of the other issues that I mentioned.  That
said, I am not sure that this will help handoff, as it does not go
over distributed erlang.



On Thu, Mar 21, 2013 at 12:22 PM, Godefroy de Compreignac
<[email protected]> wrote:
> Ok thanks Evan.
> I didn't post them before on this thread because I thought it was just
> "info", but I have a lot of messages like these:
>
> 2013-03-21 20:13:23.945 [info]
> <0.12276.176>@riak_core_sysmon_handler:handle_event:85 monitor
> busy_dist_port <0.29558.176>
> [{initial_call,{riak_core_vnode,init,1}},{almost_current_function,{gen_fsm,loop,7}},{message_queue_len,0}]
> {#Port<0.4656154>,'[email protected]'}
> 2013-03-21 20:13:26.295 [info]
> <0.12276.176>@riak_core_sysmon_handler:handle_event:85 monitor
> busy_dist_port <0.29046.176>
> [{initial_call,{riak_core_vnode,init,1}},{almost_current_function,{gen_fsm,loop,7}},{message_queue_len,0}]
> {#Port<0.21529382>,'[email protected]'}
> 2013-03-21 20:13:27.807 [info]
> <0.12276.176>@riak_core_sysmon_handler:handle_event:85 monitor
> busy_dist_port <0.29046.176>
> [{initial_call,{riak_core_vnode,init,1}},{almost_current_function,{erlang,crc32,2}},{message_queue_len,0}]
> {#Port<0.21529382>,'[email protected]'}
> 2013-03-21 20:13:27.843 [info]
> <0.12276.176>@riak_core_sysmon_handler:handle_event:85 monitor
> busy_dist_port <0.21168.176>
> [{initial_call,{riak_core_vnode,init,1}},{almost_current_function,{gen_fsm,loop,7}},{message_queue_len,0}]
> {#Port<0.6629407>,'[email protected]'}
> 2013-03-21 20:13:30.626 [info]
> <0.12276.176>@riak_core_sysmon_handler:handle_event:85 monitor
> busy_dist_port <0.29558.176>
> [{initial_call,{riak_core_vnode,init,1}},{almost_current_function,{gen_fsm,loop,7}},{message_queue_len,0}]
> {#Port<0.6629407>,'[email protected]'}
> 2013-03-21 20:13:30.771 [info]
> <0.12276.176>@riak_core_sysmon_handler:handle_event:85 monitor
> busy_dist_port <0.24361.176>
> [{initial_call,{riak_core_vnode,init,1}},{almost_current_function,{gen_fsm,loop,7}},{message_queue_len,0}]
> {#Port<0.4656154>,'[email protected]'}
> 2013-03-21 20:13:34.447 [info]
> <0.12276.176>@riak_core_sysmon_handler:handle_event:85 monitor
> busy_dist_port <0.29558.176>
> [{initial_call,{riak_core_vnode,init,1}},{almost_current_function,{gen_fsm,loop,7}},{message_queue_len,0}]
> {#Port<0.4656154>,'[email protected]'}
> 2013-03-21 20:13:36.210 [info]
> <0.12276.176>@riak_core_sysmon_handler:handle_event:85 monitor
> busy_dist_port <0.6726.946>
> [{initial_call,{riak_core_vnode,init,1}},{almost_current_function,{gen_fsm,loop,7}},{message_queue_len,0}]
> {#Port<0.4656154>,'[email protected]'}
> 2013-03-21 20:13:36.501 [info]
> <0.12276.176>@riak_core_sysmon_handler:handle_event:85 monitor
> busy_dist_port <0.32186.176>
> [{initial_call,{riak_core_vnode,init,1}},{almost_current_function,{gen_fsm,loop,7}},{message_queue_len,0}]
> {#Port<0.6629407>,'[email protected]'}
>
> I guess I have a problem with my network config...
>
> I precise that the servers hosting my Riak cluster are also running
> Couchebase, Nginx and Elasticsearch, so a lot of trafic and connections.
> /proc/sys/net/netfilter/nf_conntrack_count = 30-100K
>
>
>
> --
> Godefroy de Compreignac
>
> Eklaweb CEO - www.eklaweb.com
> EklaBlog CEO - www.eklablog.com
>
> +33(0)6 11 89 13 84
> http://www.linkedin.com/in/godefroy
> http://twitter.com/Godefroy
>
>
> 2013/3/21 Evan Vigil-McClanahan <[email protected]>
>>
>> It could be a large number of things, unfortunately.  To go through
>> them all it somewhat outside of my skill set.  Maybe someone more
>> network savvy can provide some pointers?
>>
>> Perhaps checking with your network admin, or turning any software
>> firewalls on your nodes completely off, as a test?
>>
>> On Thu, Mar 21, 2013 at 11:44 AM, Godefroy de Compreignac
>> <[email protected]> wrote:
>> > But I don't understand what could stop transfers. Maybe a kernel
>> > setting?
>> > How could I find out?
>> >
>> > --
>> > Godefroy de Compreignac
>> >
>> > Eklaweb CEO - www.eklaweb.com
>> > EklaBlog CEO - www.eklablog.com
>> >
>> > +33(0)6 11 89 13 84
>> > http://www.linkedin.com/in/godefroy
>> > http://twitter.com/Godefroy
>> >
>> >
>> > 2013/3/21 Evan Vigil-McClanahan <[email protected]>
>> >>
>> >> Handoff is done by default on port 8099.
>> >>
>> >> I guess what I am getting at here is that this doesn't look like an
>> >> obvious riak problem, it's more likely that something on your network
>> >> or on your nodes is closing or interrupting those sockets; you'd most
>> >> likely get a different error if something internal to riak was causing
>> >> the transfers to fail.
>> >>
>> >> On Thu, Mar 21, 2013 at 10:09 AM, Godefroy de Compreignac
>> >> <[email protected]> wrote:
>> >> > The only limitation that I'd see is Haproy which have a time limit:
>> >> >         contimeout      5000
>> >> >         clitimeout      50000
>> >> >         srvtimeout      3600000
>> >> >
>> >> > But Haproxy serves Riak on port 8098 and I configured Riak to use
>> >> > port
>> >> > 8097:
>> >> > {pb_port, 8087 }
>> >> > {http, [ {"5.39.68.152", 8097 } ]}
>> >> >
>> >> > So I guess Riak use only port 8097 internally, without any
>> >> > limitation.
>> >> >
>> >> > And by checking logs, I see that a vnode transfer fails after a
>> >> > random
>> >> > duration, sometimes a few minutes.
>> >
>> >
>
>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: No more disk space on a node of my Riak cluster

Reply via email to