More information ...

These errors come about a minute before the errors below:

2012-09-24 12:45:15.607 [info]
<0.6099.0>@riak_core_handoff_sender:start_fold:126 Starting
ownership_handoff transfer of riak_search_vnode from '
[email protected]' 1096126227998177188652763624537212264741949407232
to '[email protected]'
1096126227998177188652763624537212264741949407232
2012-09-24 12:45:15.607 [info]
<0.6098.0>@riak_core_handoff_sender:start_fold:126 Starting
ownership_handoff transfer of riak_search_vnode from '
[email protected]' 1004782375664995756265033322492444576013453623296
to '[email protected]'
1004782375664995756265033322492444576013453623296
2012-09-24 12:45:15.936 [error] <0.1261.0>@mi_server:handle_info:549
Unexpected info {#Port<0.7254>,{data,[2,0,0,0,0,0,0,0,1|<<128>>]}}
2012-09-24 12:45:16.886 [error] <0.1297.0>@mi_server:handle_info:549
Unexpected info {#Port<0.7250>,{data,[2,0,0,0,0,0,0,0,1|<<0>>]}}

Incidentally, this happens about once every minute.  Some amount of memory
appears to leak, as the beam process reserved memory grows some each minute
until the kernel's OOM killer kicks in after a while.

Rich

On Mon, Sep 24, 2012 at 12:09 PM, Rich Sutton <[email protected]>wrote:

> Hi,
>
> I've got a two node riak cluster set up for testing.  After joining the
> second node to the cluster, I've got some failing transfers.  Restarts on
> both nodes don't resolve the situation.  Any ideas?
>
> From error.log on transferrer node (sylvester.soiq.net):
>
> 2012-09-24 12:06:35.598 [error] <0.3180.0> gen_server <0.3180.0>
> terminated with reason: bad return value: lookup_timeout
> 2012-09-24 12:06:35.599 [error]
> <0.3276.0>@riak_core_handoff_sender:start_fold:215 ownership_handoff
> transfer of riak_search_vnode from '[email protected]'
> 1004782375664995756265033322492444576013453623296 to '
> [email protected]'
> 1004782375664995756265033322492444576013453623296 failed because of
> error:{badmatch,{error,{worker_crash,{bad_return_value,lookup_timeout},{fold,#Fun<merge_index_backend.1.120989340>,#Fun<riak_search_vnode.1.104462514>}}}}
> [{riak_core_handoff_sender,start_fold,5,[{file,"src/riak_core_handoff_sender.erl"},{line,161}]}]
> 2012-09-24 12:06:35.617 [error]
> <0.3277.0>@riak_core_handoff_sender:start_fold:215 ownership_handoff
> transfer of riak_search_vnode from '[email protected]'
> 1096126227998177188652763624537212264741949407232 to '
> [email protected]'
> 1096126227998177188652763624537212264741949407232 failed because of
> error:{badmatch,{error,{worker_crash,{bad_return_value,lookup_timeout},{fold,#Fun<merge_index_backend.1.120989340>,#Fun<riak_search_vnode.1.104462514>}}}}
> [{riak_core_handoff_sender,start_fold,5,[{file,"src/riak_core_handoff_sender.erl"},{line,161}]}]
> 2012-09-24 12:06:35.618 [error] <0.3180.0> CRASH REPORT Process <0.3180.0>
> with 0 neighbours exited with reason: bad return value: lookup_timeout in
> gen_server:terminate/6 line 747
> 2012-09-24 12:06:35.709 [error] <0.1293.0> Supervisor poolboy_sup had
> child riak_core_vnode_worker started with
> {riak_core_vnode_worker,start_link,undefined} at <0.3180.0> exit with
> reason bad return value: lookup_timeout in context child_terminated
> 2012-09-24 12:06:35.730 [error] <0.3181.0> gen_server <0.3181.0>
> terminated with reason: bad return value: lookup_timeout
> 2012-09-24 12:06:35.753 [error] <0.3181.0> CRASH REPORT Process <0.3181.0>
> with 0 neighbours exited with reason: bad return value: lookup_timeout in
> gen_server:terminate/6 line 747
> 2012-09-24 12:06:35.773 [error] <0.1310.0> Supervisor poolboy_sup had
> child riak_core_vnode_worker started with
> {riak_core_vnode_worker,start_link,undefined} at <0.3181.0> exit with
> reason bad return value: lookup_timeout in context child_terminated
>
>
> rich@daffyduck:~$ sudo riak-admin member-status
> Attempting to restart script through sudo -H -u riak
> ================================= Membership
> ==================================
> Status     Ring    Pending    Node
>
> -------------------------------------------------------------------------------
> valid      37.5%     50.0%    '[email protected]'
> valid      62.5%     50.0%    '[email protected]'
>
> -------------------------------------------------------------------------------
> Valid:2 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
>
> rich@daffyduck:~$ sudo riak-admin transfers
> Attempting to restart script through sudo -H -u riak
> '[email protected]' waiting to handoff 8 partitions
>
> Active Transfers:
>
> transfer type: ownership_handoff
> vnode type: riak_search_vnode
> partition: 1004782375664995756265033322492444576013453623296
> started: 2012-09-24 18:24:41 [-81984015.00 us ago]
> last update: no updates seen
> objects transferred: unknown
>
>                          unknown
> [email protected] =======================> [email protected]
>                          unknown
>
> transfer type: ownership_handoff
> vnode type: riak_search_vnode
> partition: 1096126227998177188652763624537212264741949407232
> started: 2012-09-24 18:24:51 [-91982788.00 us ago]
> last update: no updates seen
> objects transferred: unknown
>
>                          unknown
> [email protected] =======================> [email protected]
>                          unknown
>
> Rich
>
>


-- 
Rich Sutton
CTO / VP of Engineering
socialiqnetworks.com
mobile 714-318-7737
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to