I increased the memory to 3GB on the VMs I'm using for Riak and also
replaced a JavaScript reduce function I had missed converting to Erlang with
the Erlang version. Monitoring the memory on the machines indicates that
Riak is not running out of memory. There is lots of disk space on the
machines (~30GB free).

Riak crashed again this morning with a {error, flow_timeout} (below). My use
case is I have ~75 map/reduce jobs that run about every 10 minutes over
about 8700 items in a bucket. Once an hour the 8700 items get updated with a
write to each item.

=SUPERVISOR REPORT==== 21-Jun-2011::02:34:34 ===
     Supervisor: {local,luke_phase_sup}
     Context:    child_terminated
     Reason:     {error,flow_timeout}
     Offender:
[{pid,<0.7243.26>},{name,undefined},{mfa,{luke_phase,start_lin\
k,[riak_kv_reduce_phase,3,[accumulate,converge],undefined,<0.7241.26>,66000,[{e\
rlang,{reduce, ...


=ERROR REPORT==== 21-Jun-2011::02:34:34 ===
** State machine <0.7274.26> terminating
** Last message in was {'EXIT',<0.7241.26>,{error,flow_timeout}}
** When State == waiting_kl
**      Data  == {state,<0.7241.26>,mapred,<<>>,
                     [[{639406966332270026714112114313373821099470487552,
                        '[email protected]'},


- Jeremy


On Mon, Jun 20, 2011 at 1:19 PM, Greg Nelson <[email protected]> wrote:

>  I see this from time to time on our production 5-node cluster, with no
> indications of any other problems.  And I'm certain it's not a memory or
> disk issue.
>
> On Sunday, June 19, 2011 at 6:01 PM, Jeremy Raymond wrote:
>
> Actually it's a bit later on where I see this:
>
> ===== Fri Jun 17 16:26:46 EDT 2011
>
> =ERROR REPORT==== 17-Jun-2011::16:26:46 ===
> ** Generic server riak_kv_stat terminating
> ** Last message in was {'$gen_cast',{update,vnode_get,63475547206}}
> ** When Server state == {state,
>                             {spiral,63475547266,
>
> [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
>
>  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
>                                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},
>                             {spiral,63475547266,
>
> [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
>
>  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
>                                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},
>                             2846696,42859,35845,31080,
>                             {slide,63475532874,60,60,
>
> "/tmp/riak/slide-data/474/1308.328066.57288",
>                                 {file_descriptor,prim_file,
>                                     {#Port<0.64895>,24}},
>                                 63475547205},
>                             {slide,63475532874,60,60,
>
> "/tmp/riak/slide-data/474/1308.328066.59517",
>                                 {file_descriptor,prim_file,
>                                     {#Port<0.64896>,13}},
>                                 63475547205},
>                             {spiral,63475547266,
>
> [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
>
>  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
>                                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},
>                             978,4,
>                             {spiral,63475547266,
>
> [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
>
>  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
>                                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},
>                             21375,21}
> ** Reason for termination ==
> ** {badarg,[{erlang,hd,[[]]},
>             {spiraltime,incr,3},
>             {riak_kv_stat,spiral_incr,3},
>             {riak_kv_stat,handle_cast,2},
>             {gen_server2,handle_msg,7},
>             {proc_lib,init_p_do_apply,3}]}
> ([email protected])3> /usr/lib/riak/lib/os_mon-2.2.5/priv/bin/memsup: Erlang
> has closed.
> Erlang has closed
>
>
> Other than that the only other thing that stands out is the info messages
> about system_memory_high_watermark.
>
> - Jeremy
>
>
> On Sun, Jun 19, 2011 at 8:47 PM, Jeremy Raymond <[email protected]>wrote:
>
> I see these messages near the time of the crash. Indicators of low system
> memory?
>
> =INFO REPORT==== 17-Jun-2011::12:27:35 ===
> Spidermonkey VM (pool: riak_kv_js_map) host stopping (<0.160.0>)
> ([email protected])1>
> =INFO REPORT==== 17-Jun-2011::12:27:35 ===
> Spidermonkey VM (pool: riak_kv_js_map) host stopping (<0.159.0>)
> ([email protected])1> /usr/lib/riak/lib/os_mon-2.2.5/priv/bin/memsup:
> Erlang has closed.
>
> =INFO REPORT==== 17-Jun-2011::12:27:35 ===
> [{alarm_handler,{clear,system_memory_high_watermark}}]([email protected])1>
> Erlang has closed
>
>
> - Jeremy
>
>
>
> On Sun, Jun 19, 2011 at 4:52 PM, Sean Cribbs <[email protected]> wrote:
>
> Jeremy,
>
> There is an open pull request for fixing this bug.  Details of the bug are
> here: https://issues.basho.com/show_bug.cgi?id=1072  It will not cause
> your entire to node to crash, there is likely some other cause. Node exits
> tend to only happen when you run out of RAM or disk space, or can't bind to
> a TCP port during riak_core startup. Were there other indications in the
> erlang.log.*?
>
> --
> Sean Cribbs <[email protected]>
> Developer Advocate
> Basho Technologies, Inc.
> http://www.basho.com/
>
>
> On Sat, Jun 18, 2011 at 9:53 PM, Jeremy Raymond <[email protected]>wrote:
>
> Hello,
>
> I have a 3 node Riak 0.14.2 cluster from the deb packages running on Ubuntu
> 10.10. I had a node go down with the following error from the
> sasl-error.log. Ideas on tracking down the cause?
>
> - Jeremy
>
>
> =ERROR REPORT==== 17-Jun-2011::16:26:46 ===
> ** Generic server riak_kv_stat terminating
> ** Last message in was {'$gen_cast',{update,vnode_get,63475547206}}
> ** When Server state == {state,
>                             {spiral,63475547266,
>
> [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
>
>  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
>                                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},
>                             {spiral,63475547266,
>
> [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
>
>  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
>                                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},
>                             2846696,42859,35845,31080,
>                             {slide,63475532874,60,60,
>
> "/tmp/riak/slide-data/474/1308.328066.57288",
>                                 {file_descriptor,prim_file,
>                                     {#Port<0.64895>,24}},
>                                 63475547205},
>                             {slide,63475532874,60,60,
>
> "/tmp/riak/slide-data/474/1308.328066.59517",
>                                 {file_descriptor,prim_file,
>                                     {#Port<0.64896>,13}},
>                                 63475547205},
>                             {spiral,63475547266,
>
> [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
>
>  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
>                                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},
>                             978,4,
>                             {spiral,63475547266,
>
> [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
>
>  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
>                                  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},
>                             21375,21}
> ** Reason for termination ==
> ** {badarg,[{erlang,hd,[[]]},
>             {spiraltime,incr,3},
>             {riak_kv_stat,spiral_incr,3},
>             {riak_kv_stat,handle_cast,2},
>             {gen_server2,handle_msg,7},
>             {proc_lib,init_p_do_apply,3}]}
>
> =CRASH REPORT==== 17-Jun-2011::16:26:46 ===
>   crasher:
>     initial call: gen:init_it/7
>     pid: <0.151.0>
>     registered_name: riak_kv_stat
>     exception exit:
> {badarg,[{erlang,hd,[[]]},{spiraltime,incr,3},{riak_kv_stat,spiral_incr,3},{riak_kv_stat,handle_cast,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>       in function  gen_server2:terminate/6
>       in call from proc_lib:init_p_do_apply/3
>     ancestors: [riak_kv_sup,<0.138.0>]
>     messages:
> [{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,mapper_end,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,mapper_end,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_put,63475547206}}]
>     links: [#Port<0.64896>,<0.145.0>,#Port<0.64895>]
>     dictionary: []
>     trap_exit: true
>     status: running
>     heap_size: 2584
>     stack_size: 24
>     reductions: 81329187
>   neighbours:
>
> =SUPERVISOR REPORT==== 17-Jun-2011::16:26:46 ===
>      Supervisor: {local,riak_kv_sup}
>      Context:    child_terminated
>      Reason:
> {badarg,[{erlang,hd,[[]]},{spiraltime,incr,3},{riak_kv_stat,spiral_incr,3},{riak_kv_stat,handle_cast,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>      Offender:
> [{pid,<0.151.0>},{name,riak_kv_stat},{mfa,{riak_kv_stat,start_link,[]}},{restart_type,permanent},{shutdown,5000},{child_type,worker}]
>
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to