Re: Ownership handoff never completes

Brian Sparrow Wed, 20 Nov 2013 10:54:31 -0800

Special thanks to Engel Sanchez who quickly came up with those snippets to mark 
their handoff complete.


-- 
Brian Sparrow
Developer Advocate
Basho Technologies

Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Wednesday, November 20, 2013 at 1:50 PM, Mark Phillips wrote:

> Excellent. I'm glad to hear it's cleared up. 
> 
> All hail Sparrow: slayer of issues; master of drum kits.  
> 
> Mark 
> 
> On Wednesday, November 20, 2013, Jeppe Toustrup wrote:
> > I've got the problem solved thanks to Brian Sparrow on the IRC channel.
> > 
> > Here's the steps we tried during the troubleshooting session:
> > 
> > 1. We first tried to delete the data folders on the receiving node for
> > the two partitions, while the node was stopped, to see if it would
> > retrigger the ownership handoff. It didn't change anything.
> > 
> > 2. We then tried to insert the following Erlang code on the sending
> > node, in order to see if it would retrigger the ownership handoff. The
> > partition IDs were for the partitions needing to be transfered:
> > IdxList = [696496874040508421956443553091353626554780352512,
> > 239777612374601260017792042867515182912301432832],
> >   Mod = riak_kv,
> >   Ring = riak_core_ring_manager:get_my_ring(),
> >   riak_core_ring_manager:ring_trans(
> >         fun(Ring, _) ->
> >                 Ring2 = lists:foldl(
> >                           fun(Idx, Ring) ->
> > 
> > riak_core_ring:handoff_complete(Ring, Idx, Mod)
> >                           end,
> >                           Ring,
> >                           IdxList),
> >                 {new_ring, Ring2}
> >         end, []).
> > 
> > That piece of code didn't help anything either. The output of the
> > command showed the two partitions to be in the "awaiting" state:
> > 
> >                 [{239777612374601260017792042867515182912301432832,
> >                   '[email protected]','[email protected]',
> >                   [riak_kv,riak_kv_vnode,riak_pipe_vnode],
> >                   awaiting},
> >                  {696496874040508421956443553091353626554780352512,
> >                   '[email protected]','[email protected]',
> >                   [riak_kv,riak_kv_vnode,riak_pipe_vnode],
> >                   awaiting}],
> > 
> > 3. Brian suggested that I should run
> > "riak_core_ring_events:force_update()." in the Erlang console as well,
> > but that didn't have any effect.
> > 
> > 4. I send the ring directories from the source and destination nodes
> > to Brian, and he came back with the following Erlang code which
> > problem for us:
> > 
> > IdxList = [696496874040508421956443553091353626554780352512,
> > 239777612374601260017792042867515182912301432832],
> >   Mod = riak_kv_vnode,
> >   Ring = riak_core_ring_manager:get_my_ring(),
> >   riak_core_ring_manager:ring_trans(
> >         fun(Ring, _) ->
> >                 Ring1 = begin
> >                             A = element(7, Ring),
> >                             B = [{B1, B2, B3,
> >                                   [B4E || B4E <- B4, B4E /= riak_kv],
> >                                 B5} || {B1, B2, B3, B4, B5} <- A],
> >                             setelement(7,Ring, B)
> >                         end,
> >                 Ring2 = lists:foldl(
> >                           fun(Idx, R) ->
> >                                   riak_core_ring:handoff_complete(R, Idx, 
> > Mod)
> >                           end,
> >                           Ring1,
> >                           IdxList),
> >                 {new_ring, Ring2}
> >         end, []).
> > 
> > The output of the command showed the handoffs was complete:
> > 
> >                 [{239777612374601260017792042867515182912301432832,
> >                   '[email protected]','[email protected]',
> >                   [riak_kv_vnode,riak_pipe_vnode],
> >                   complete},
> >                  {696496874040508421956443553091353626554780352512,
> >                   '[email protected]','[email protected]',
> >                   [riak_kv_vnode,riak_pipe_vnode],
> >                   complete}],
> > 
> > And I could confirm that with the usual "ring-status", "member-status"
> > and "transfers" commands. There were no pending transfers, no pending
> > ownership handoffs and the cluster didn't show the rebalancing to be
> > in progress any more.
> > 
> > Thanks a lot to Brian for helping solve this issue. I hope anybody
> > else who may encounter it can use the above info.
> > 
> > --
> > Jeppe Fihl Toustrup
> > Operations Engineer
> > Falcon Social
> > 
> > 
> > On 20 November 2013 17:52, Mark Phillips <[email protected]> wrote:
> > > Hmm. The fact that you've disabled Search probably changes things but I'm
> > > not entirely sure how.
> > >
> > > Ryan et al - any ideas?
> > >
> > > Mark
> > >
> > > On Wednesday, November 20, 2013, Jeppe Toustrup wrote:
> > >>
> > >> Hi
> > >>
> > >> Thank you for the guide. I stopped two of the nodes (the source and
> > >> the destination of the partition transfers), renamed the folders
> > >> inside the merge_index folder and started them again. The ownership
> > >> handoff does however not seem to be retried.
> > >>
> > >> Looking at the logs it seems like the last attempt was 48 hours ago.
> > >> Is there any logic inside Riak which causes it to give up after a
> > >> certain amount of tries?
> > >> Is there a way I can retrigger the handoffs?
> > >> I have tried to set the transfer-limit on the cluster to 0 and then
> > >> back to 2, but it doesn't seem to do anything.
> > >>
> > >> I wonder if we need the merge_index folder at all, as we have disabled
> > >> Riak search since the initial configuration of the cluster. We found a
> > >> better way to query our data so that we don't need Riak search
> > >> anymore. We disabled it by resetting the properties on the buckets
> > >> where search was enabled, and then disabled search in app.config
> > >> followed by a restart of each of the nodes. This was done after the
> > >> ownership handoff issue first occurred.
> > >>
> > >> --
> > >> Jeppe Fihl Toustrup
> > >> Operations Engineer
> > >> Falcon Social
> > >>
> > >>
> > >> On 19 November 2013 23:17, Mark Phillips <[email protected]> wrote:
> > >> > Hi Jeppe,
> > >> >
> > >> >
> > >> >
> > >> > As you suspected, this looks like index corruption in Search that's
> > >> > preventing handoff from finishing.  Specifically, you'll need to delete
> > >> > the
> > >> >
> > >> > segment files for the two partitions' indexes and rebuild those indexes
> > >> > post-transfer.
> > >> >
> > >> >
> > >> > Here's the full process:
> > >> >
> > >> >
> > >> >
> > >> > - Stop each node that owns the partitions in question.
> > >> > - Delete the data directory for each partition (which contains the
> > >> > segment
> > >> > files). It should be something like:
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > "rm -rf /var/lib/riak/merge_index/<p>"
> > >> >
> > >> >
> > >> > - Restart each node
> > >> >
> > >> > - Wait for the transfers to complete
> > >> > - Rebuild the indexes in question [1]
> > >> >
> > >> >
> > >> > Let us know if you run into any further issues.
> > >> >
> > >> >
> > >> >
> > >> > Mark
> > >> >
> > >> >
> > >> > [1]
> > >> >
> > >> > http://docs.basho.com/riak/latest/ops/running/recovery/repairing-indexes/
> > >> >
> > >> >
> > >> >
> > >> > On Tue, Nov 19, 2013 at 4:26 AM, Jeppe Toustrup 
> > >> > <[email protected]>
> > >> > wrote:
> > >> >>
> > >> >> Hi
> > >> >>
> > >> >> I have recently added two extra nodes to the now seven node Riak
> > >> >> cluster. The rebalancing following the expansion worked fine, except
> > >> >> for two partitions which seem to not being able to go through. Running
> > >> >> "riak-admin ring-status" shows the following:
> > >> >>
> > >> >> ============================== Ownership Handoff
> > >> >> ==============================
> > >> >> Owner:      [email protected]
> > >> >> Next Owner: [email protected]
> > >> >>
> > >> >> Index: 239777612374601260017792042867515182912301432832
> > >> >>   Waiting on: []
> > >> >>   Complete:   [riak_kv_vnode,riak_pipe_< 
> _______________________________________________
> riak-users mailing list
> [email protected] (mailto:[email protected])
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Ownership handoff never completes

Reply via email to