Evan,

I verified that all of the memory backends have the same ttl settings and
have done rolling restarts but it doesn't seem to make a difference. One
thing to note though -- I remember this problem starting roughly around the
time I migrated a bucket from being backed by leveldb to being backed by
memory. I did this by setting the bucket properties via curl and let Riak
do the migration of the objects in that bucket. Would that cause such
issues?

Thanks for your help.

-giri

On Thu, Mar 28, 2013 at 4:55 PM, Evan Vigil-McClanahan <
emcclana...@basho.com> wrote:

> Giri, I've seen similar issues in the past when someone was adjusting
> their ttl setting on the memory backend.  Because one memory backend
> has it and the other does not, it fails on handoff.   The solution
> then was to make sure that all memory backend settings are the same
> and then do a rolling restart of the cluster (ignoring a lot of errors
> along the way).  I am not sure that this is applicable to your case,
> but it's something to look at.
>
> On Thu, Mar 28, 2013 at 10:22 AM, Giri Iyengar
> <giri.iyen...@sociocast.com> wrote:
> > Godefroy:
> >
> > Thanks. Your email exchange on the mailing list was what prompted me to
> > consider switching to Riak 1.3. I do see repair messages in the console
> logs
> > and so some healing is happening. However, there are a bunch of hinted
> > handoffs and ownership handoffs that are simply not proceeding because
> the
> > same vnodes keep coming up for transfer and fail. Perhaps there is a
> manual
> > way to forcibly repair and push the vnodes around.
> >
> > -giri
> >
> >
> > On Thu, Mar 28, 2013 at 1:19 PM, Godefroy de Compreignac
> > <godef...@eklablog.com> wrote:
> >>
> >> I have exactly the same problem with my cluster. If anyone knows what
> >> those errors mean... :-)
> >>
> >> Godefroy
> >>
> >>
> >> 2013/3/28 Giri Iyengar <giri.iyen...@sociocast.com>
> >>>
> >>> Hello,
> >>>
> >>> We are running a 6-node Riak 1.3.0 cluster in production. We recently
> >>> upgraded to 1.3. Prior to this, we were running Riak 1.2 on the same
> 6-node
> >>> cluster.
> >>>
> >>> We are finding that the nodes are not balanced. For instance:
> >>>
> >>> ================================= Membership
> >>> ==================================
> >>> Status     Ring    Pending    Node
> >>>
> >>>
> -------------------------------------------------------------------------------
> >>> valid       0.0%      0.0%    'riak@172.16.25.106'
> >>> valid      34.4%     20.3%    'riak@172.16.25.107'
> >>> valid      21.9%     20.3%    'riak@172.16.25.113'
> >>> valid      19.5%     20.3%    'riak@172.16.25.114'
> >>> valid       8.6%     19.5%    'riak@172.16.25.121'
> >>> valid      15.6%     19.5%    'riak@172.16.25.122'
> >>>
> >>>
> -------------------------------------------------------------------------------
> >>> Valid:6 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
> >>>
> >>>
> >>> When we look at the logs in the largest node (riak@172.16.25.107), we
> see
> >>> error messages that look like this:
> >>>
> >>> 2013-03-28 13:04:16.957 [error]
> >>> <0.10957.1462>@riak_core_handoff_sender:start_fold:226 hinted_handoff
> >>> transfer of riak_kv_vnode from 'riak@172.16.25.107'
> >>> 148433760041419827630061740822747494183805648896 to '
> riak@172.16.25.121'
> >>> 148433760041419827630061740822747494183805648896 failed because of
> >>>
> error:{badmatch,{error,{worker_crash,{function_clause,[{riak_core_pb,encode,[{ts,{1364,476737,222223}},{{ts,{1364,476737,222223}},<<131,104,7,100,0,8,114,95,111,98,106,101,99,116,109,0,0,0,11,69,78,84,73,84,89,95,83,69,83,83,109,0,0,0,36,67,54,57,95,48,48,51,56,100,56,102,50,52,49,52,99,97,97,54,102,99,52,56,53,52,99,99,101,51,98,50,48,102,53,98,52,108,0,0,0,1,104,3,100,0,9,114,95,99,111,110,116,101,110,116,104,9,100,0,4,100,105,99,116,97,5,97,16,97,16,97,8,97,80,97,48,104,16,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,104,1,104,16,106,106,106,106,106,106,106,106,106,106,108,0,0,0,2,108,0,0,0,11,109,0,0,0,12,99,111,110,116,101,110,116,45,116,121,112,101,97,116,97,101,97,120,97,116,97,47,97,112,97,108,97,97,97,105,97,110,106,108,0,0,0,23,109,0,0,0,11,88,45,82,105,97,107,45,86,84,97,103,97,51,97,120,97,105,97,101,97,120,97,66,97,120,97,107,97,119,97,101,97,75,97,117,97,122,97,111,97,55,97,85,97,104,97,85,97,107,97,112,97,120,97,107,106,106,108,0,0,0,1,108,0,0,0,1,109,0,0,0,5,105,110,100,101,120,106,106,106,108,0,0,0,1,108,0,0,0,1,109,0,0,0,20,88,45,82,105,97,107,45,76,97,115,116,45,77,111,100,105,102,105,101,100,104,3,98,0,0,5,84,98,0,7,70,65,98,0,3,99,115,106,106,108,0,0,0,1,108,0,0,0,6,109,0,0,0,7,99,104,97,114,115,101,116,97,85,97,84,97,70,97,45,97,56,106,106,109,0,0,0,36,52,54,55,98,54,51,98,50,45,50,99,56,52,45,52,56,50,99,45,97,48,99,54,45,56,53,50,100,53,99,57,97,98,98,53,101,106,108,0,0,0,1,104,2,109,0,0,0,8,0,69,155,215,81,84,63,31,104,2,97,1,110,5,0,65,191,200,202,14,106,104,9,100,0,4,100,105,99,116,97,1,97,16,97,16,97,8,97,80,97,48,104,16,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,104,1,104,16,106,106,106,106,106,106,106,106,106,106,106,106,106,106,108,0,0,0,1,108,0,0,0,1,100,0,5,99,108,101,97,110,100,0,4,116,114,117,101,106,106,100,0,9,117,110,100,101,102,105,110,101,100>>}],[{file,"src/riak_core_pb.erl"},{line,40}]},{riak_core_pb,pack,5,...},...]},...}}}
> >>>
> [{riak_core_handoff_sender,start_fold,5,[{file,"src/riak_core_handoff_sender.erl"},{line,170}]}]
> >>> 2013-03-28 13:04:16.961 [error] <0.29352.909> CRASH REPORT Process
> >>> <0.29352.909> with 0 neighbours exited with reason: no function clause
> >>> matching riak_core_pb:encode({ts,{1364,476737,222223}},
> >>>
> {{ts,{1364,476737,222223}},<<131,104,7,100,0,8,114,95,111,98,106,101,99,116,109,0,0,0,11,69,78,...>>})
> >>> line 40 in gen_server:terminate/6 line 747
> >>>
> >>>
> >>> 2013-03-28 13:04:13.888 [error]
> >>> <0.12680.1435>@riak_core_handoff_sender:start_fold:226
> ownership_handoff
> >>> transfer of riak_kv_vnode from 'riak@172.16.25.107'
> >>> 11417981541647679048466287755595961091061972992 to 'riak@172.16.25.113
> '
> >>> 11417981541647679048466287755595961091061972992 failed because of
> >>>
> error:{badmatch,{error,{worker_crash,{function_clause,[{riak_core_pb,encode,[{ts,{1364,458917,232318}},{{ts,{1364,458917,232318}},<<131,104,7,100,0,8,114,95,111,98,106,101,99,116,109,0,0,0,11,69,78,84,73,84,89,95,83,69,83,83,109,0,0,0,36,67,54,57,95,48,48,48,54,52,98,99,52,53,51,49,52,55,101,50,101,53,97,102,101,102,49,57,99,50,55,99,97,49,53,54,99,108,0,0,0,1,104,3,100,0,9,114,95,99,111,110,116,101,110,116,104,9,100,0,4,100,105,99,116,97,5,97,16,97,16,97,8,97,80,97,48,104,16,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,104,1,104,16,106,106,106,106,106,106,106,106,106,106,108,0,0,0,2,108,0,0,0,11,109,0,0,0,12,99,111,110,116,101,110,116,45,116,121,112,101,97,116,97,101,97,120,97,116,97,47,97,112,97,108,97,97,97,105,97,110,106,108,0,0,0,23,109,0,0,0,11,88,45,82,105,97,107,45,86,84,97,103,97,54,97,88,97,76,97,66,97,69,97,69,97,116,97,73,97,104,97,118,97,77,97,86,97,48,97,81,97,103,97,110,97,119,97,73,97,51,97,85,97,72,97,53,106,106,108,0,0,0,1,108,0,0,0,1,109,0,0,0,5,105,110,100,101,120,106,106,106,108,0,0,0,1,108,0,0,0,1,109,0,0,0,20,88,45,82,105,97,107,45,76,97,115,116,45,77,111,100,105,102,105,101,100,104,3,98,0,0,5,84,98,0,7,0,165,98,0,3,138,179,106,106,108,0,0,0,1,108,0,0,0,6,109,0,0,0,7,99,104,97,114,115,101,116,97,85,97,84,97,70,97,45,97,56,106,106,109,0,0,0,36,55,102,98,52,50,54,54,53,45,57,100,56,48,45,52,54,98,97,45,98,53,97,100,45,56,55,52,52,54,54,97,97,50,56,53,99,106,108,0,0,0,1,104,2,109,0,0,0,8,0,69,155,215,81,59,179,219,104,2,97,1,110,5,0,165,121,200,202,14,106,104,9,100,0,4,100,105,99,116,97,1,97,16,97,16,97,8,97,80,97,48,104,16,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,104,1,104,16,106,106,106,106,106,106,106,106,106,106,106,106,106,106,108,0,0,0,1,108,0,0,0,1,100,0,5,99,108,101,97,110,100,0,4,116,114,117,101,106,106,100,0,9,117,110,100,101,102,105,110,101,100>>}],[{file,"src/riak_core_pb.erl"},{line,40}]},{riak_core_pb,pack,5,[{...},...]},...]},...}}}
> >>>
> [{riak_core_handoff_sender,start_fold,5,[{file,"src/riak_core_handoff_sender.erl"},{line,170}]}]
> >>> 2013-03-28 13:04:14.255 [error] <0.1120.0> CRASH REPORT Process
> >>> <0.1120.0> with 0 neighbours exited with reason: no function clause
> matching
> >>> riak_core_pb:encode({ts,{1364,458917,232318}},
> >>>
> {{ts,{1364,458917,232318}},<<131,104,7,100,0,8,114,95,111,98,106,101,99,116,109,0,0,0,11,69,78,...>>})
> >>> line 40 in gen_server:terminate/6 line 747
> >>>
> >>> This has been going on for days and the cluster doesn't seem to be
> >>> rebalancing itself. We see this issue with both hinted_handoffs and
> >>> ownership_handoffs. Looks like we have some corrupt data in our
> cluster. I
> >>> checked through the leveldb LOGs and did not see any compaction errors.
> >>> I was hoping that upgrading to 1.3.0 will slowly start repairing the
> >>> cluster. However, that doesn't seem to be happening.
> >>>
> >>> Any help/hints would be much appreciated.
> >>>
> >>> -giri
> >>> --
> >>> GIRI IYENGAR, CTO
> >>> SOCIOCAST
> >>> Simple. Powerful. Predictions.
> >>>
> >>> 36 WEST 25TH STREET, 7TH FLOOR NEW YORK CITY, NY 10010
> >>> O: 917.525.2466x104   M: 914.924.7935   F: 347.943.6281
> >>> E: giri.iyen...@sociocast.com W: www.sociocast.com
> >>>
> >>> Facebook's Ad Guru Joins Sociocast - http://bit.ly/NjPQBQ
> >>>
> >>> _______________________________________________
> >>> riak-users mailing list
> >>> riak-users@lists.basho.com
> >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >>>
> >>
> >
> >
> >
> > --
> > GIRI IYENGAR, CTO
> > SOCIOCAST
> > Simple. Powerful. Predictions.
> >
> > 36 WEST 25TH STREET, 7TH FLOOR NEW YORK CITY, NY 10010
> > O: 917.525.2466x104   M: 914.924.7935   F: 347.943.6281
> > E: giri.iyen...@sociocast.com W: www.sociocast.com
> >
> > Facebook's Ad Guru Joins Sociocast - http://bit.ly/NjPQBQ
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
>



-- 
GIRI IYENGAR, CTO
SOCIOCAST
Simple. Powerful. Predictions.

36 WEST 25TH STREET, 7TH FLOOR NEW YORK CITY, NY 10010
O: 917.525.2466x104   M: 914.924.7935   F: 347.943.6281
E: *giri.iyen...@sociocast.com* W: *www.sociocast.com*

Facebook's Ad Guru Joins Sociocast - http://bit.ly/NjPQBQ
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to