Hi Gordon,

I'm looking at the info you provided about the problem and I suspect that it is 
related to your use of 90 as the ring creation size. We generally recommend the 
value to be a power of 2, though we do not explicitly enforce that in the code. 
If this is a development cluster the simplest path may be to wipe out all of 
the cluster data and ring information, change the ring_creation_size to 128 
(for example) on each node, reform the cluster and reinsert the data. 

Another path to resolution is to use the riak-admin backup option to backup 
your data, change the ring_creation_size to 128, remove the existing ring data 
(i.e. the data/ring/ subdirectory where your riak data lives). Do this for each 
node in the cluster. Then restore the data on each node with riak-admin restore 
and rejoin the nodes to form a cluster again. Unfortuately, search data is not 
handled by the current backup and restore functionality so you will need to 
remove the search data from data/merge-index and then you will have to reindex 
your documents. Support for backup and restore of search data is coming, but it 
is just not there yet. Hope this helps.

Kelly

On Oct 28, 2011, at 12:46 PM, Gordon Tillman wrote:

> Howdy Gang,
> 
> We have a 3 node Riak 1.0.1 with search enabled and are seeing the following 
> errors in the Riak log file:
> 
> ==> /var/log/riak/console.log <==
> 2011-10-28 11:04:21.900 [error] <0.993.0> gen_fsm <0.993.0> in state 
> initialize terminated with reason: bad argument in call to erlang:hd([]) in 
> riak_core_ring:index_owner/2
> 2011-10-28 11:04:21.906 [error] <0.993.0> CRASH REPORT Process <0.993.0> with 
> 0 neighbours crashed with reason: bad argument in call to erlang:hd([]) in 
> riak_core_ring:index_owner/2
> 2011-10-28 11:04:21.930 [error] <0.257.0> Supervisor riak_kv_keys_fsm_sup had 
> child undefined started with {riak_core_coverage_fsm,start_link,undefined} at 
> <0.993.0> exit with reason bad argument in call to erlang:hd([]) in 
> riak_core_ring:index_owner/2 in context child_terminated
> 
> ==> /var/log/riak/crash.log <==
> 2011-10-28 11:04:21 =ERROR REPORT====
> ** State machine <0.993.0> terminating 
> ** Last event in was timeout
> ** When State == initialize
> **      Data  == 
> {state,undefined,riak_kv_keys_fsm,{state,plain,{raw,60225046,<0.465.0>}},3,riak_kv,all,1,{'riak_kv_listkeys_req_v3',<<"queue">>,none},60225046,undefined,0,60000,riak_kv_vnode_master}
> ** Reason for termination = 
> ** 
> {badarg,[{erlang,hd,[[]]},{riak_core_ring,index_owner,2},{riak_core_coverage_plan,'-create_plan/5-fun-0-',6},{lists,mapfoldl,3},{riak_core_coverage_fsm,initialize,2},{gen_fsm,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
> 2011-10-28 11:04:21 =CRASH REPORT====
>   crasher:
>     initial call: riak_core_coverage_fsm:init/1
>     pid: <0.993.0>
>     registered_name: []
>     exception exit: 
> {badarg,[{erlang,hd,[[]]},{riak_core_ring,index_owner,2},{riak_core_coverage_plan,'-create_plan/5-fun-0-',6},{lists,mapfoldl,3},{riak_core_coverage_fsm,initialize,2},{gen_fsm,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>       in function  gen_fsm:terminate/7
>       in call from proc_lib:init_p_do_apply/3
>     ancestors: [riak_kv_keys_fsm_sup,riak_kv_sup,<0.177.0>]
>     messages: []
>     links: [<0.257.0>]
>     dictionary: []
>     trap_exit: false
>     status: running
>     heap_size: 4181
>     stack_size: 24
>     reductions: 86192
>   neighbours:
> 2011-10-28 11:04:21 =SUPERVISOR REPORT====
>      Supervisor: {local,riak_kv_keys_fsm_sup}
>      Context:    child_terminated
>      Reason:     
> {badarg,[{erlang,hd,[[]]},{riak_core_ring,index_owner,2},{riak_core_coverage_plan,'-create_plan/5-fun-0-',6},{lists,mapfoldl,3},{riak_core_coverage_fsm,initialize,2},{gen_fsm,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
>      Offender:   
> [{pid,<0.993.0>},{name,undefined},{mfargs,{riak_core_coverage_fsm,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
> 
> 
> ==> /var/log/riak/erlang.log.3 <==
> 11:04:21.900 [error] gen_fsm <0.993.0> in state initialize terminated with 
> reason: bad argument in call to erlang:hd([]) in riak_core_ring:index_owner/2
> 11:04:21.906 [error] CRASH REPORT Process <0.993.0> with 0 neighbours crashed 
> with reason: bad argument in call to erlang:hd([]) in 
> riak_core_ring:index_owner/2 
> 11:04:21.930 [error] Supervisor riak_kv_keys_fsm_sup had child undefined 
> started with {riak_core_coverage_fsm,start_link,undefined} at <0.993.0> exit 
> with reason bad argument in call to erlang:hd([]) in 
> riak_core_ring:index_owner/2 in context child_terminated
> 
> ==> /var/log/riak/error.log <==
> 2011-10-28 11:04:21.900 [error] <0.993.0> gen_fsm <0.993.0> in state 
> initialize terminated with reason: bad argument in call to erlang:hd([]) in 
> riak_core_ring:index_owner/2
> 2011-10-28 11:04:21.906 [error] <0.993.0> CRASH REPORT Process <0.993.0> with 
> 0 neighbours crashed with reason: bad argument in call to erlang:hd([]) in 
> riak_core_ring:index_owner/2
> 2011-10-28 11:04:21.930 [error] <0.257.0> Supervisor riak_kv_keys_fsm_sup had 
> child undefined started with {riak_core_coverage_fsm,start_link,undefined} at 
> <0.993.0> exit with reason bad argument in call to erlang:hd([]) in 
> riak_core_ring:index_owner/2 in context child_terminated
> 
> Would anyone be able to clue me in as to the cause?  I have the log snippet, 
> status, and cluster info all available online.
> 
> LOG:
> https://gist.github.com/1322665
> 
> STATUS:
> https://gist.github.com/1322680
> 
> CLUSTER INFO:
> 
> This was to big to fit in a gist (1.7MB):
> 
> https://eval.mezeo.net/v2/files/A725C464-0194-11E1-9D55-0030485F2412/content/inline/cluster_info.txt
> 
> Thanks a bunch!
> 
> --gordon
> 
> 
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to