Re: riak core dumped, merge_index corruption?

Ryan Zezeski Tue, 06 Aug 2013 13:50:18 -0700

Deyan,

What Riak version are you running?  There was a corruption issue discovered
and fixed in the 1.4.0 release.

https://github.com/basho/riak/blob/riak-1.4.0/RELEASE-NOTES.md#issues--prs-resolved
https://github.com/basho/merge_index/pull/30

As for fixing, you'll want to delete the buffer files for the partitions
which are having issues.  E.g. if you look in crash.log you'll see
partition numbers for the crashing vnodes.

> **      Data  ==
{state,685078892498860742907977265335757665463718379520,riak_search_vnode,undefined,undefined,none,undefined,undefined,undefined,undefined,0}

In the
/storage/riak/merge_index/685078892498860742907977265335757665463718379520
you'll see buffer files.  You'll want to delete those.  After deleting all
these bad buffers Riak Search should start fine.  You'll then want to
upgrade to 1.4.1 to avoid corruption in the future.  Finally, since you
have to delete the buffers you'll have missing indexes and you'll want to
re-index your data.

Since only one of your nodes experience corruption you can use the built-in
repair functionality to re-index only data for those partitions.  First
you'll want to attach to one of your nodes.  Then for each partition run
the following.

> riak_search_vnode:repair(P)

Make sure to run repair for only one partition at a time to avoid
overloading anything.

To determine when a repair is finished you can periodically call the
following.  Once it returns 'no_repair' that indicates it has finished.

> riak_search_vnode:repair_status(P)

Here is more information on the repair command.

http://docs.basho.com/riak/latest/cookbooks/Repairing-Search-Indexes/

-Z

On Tue, Aug 6, 2013 at 5:17 AM, Deyan Dyankov <[email protected]> wrote:

> hi,
>
> we have a 3 node cluster and one of the node crashed yesterday.
> Nodes are db1, db2 and db3. We started other services on db1 and db2 and
> db1 crashed. Currently db2 and db3 are fine, balanced, receiving writes and
> serving reads.
> However, db1 has issues starting. When I start the node, it outputs
> numerous errors and this finally results in a core dump. We use Riak search
> and this may be the reason for the dump. After starting the node, these are
> the first errors that are seen in the log file:
>
> […]
> 2013-08-06 11:06:08.989 [info] <0.7.0> Application erlydtl started on node
> '[email protected]'
> *2013-08-06 11:06:16.675 [warning] <0.5010.0> Corrupted posting detected
> in
> /storage/riak/merge_index/456719261665907161938651510223838443642478919680/buffer.598
> after reading 2281*
> *49 bytes, ignoring remainder.*
> 2013-08-06 11:06:18.922 [error] <0.5310.0> CRASH REPORT Process <0.5310.0>
> with 0 neighbours exited with reason: bad argument in call to
> erlang:binary_to_term(<<131,108,0,0,0,1,10
> 4,4,104,3,109,0,0,0,25,99,120,108,101,118,101,110,116,115,95,99,97,107,101,...>>)
> in mi_buffer:read_value/2 line 162 in gen_server:init_it/6 line 328
> 2013-08-06 11:06:20.751 [error] <0.5309.0> gen_fsm <0.5309.0> in state
> started terminated with reason: no function clause matching
> riak_search_vnode:terminate({{badmatch,{error,{b
> adarg,[{erlang,binary_to_term,[<<131,108,0,0,0,1,104,4,104,3,109,0,0,0,25,...>>],...},...]}}},...},
> undefined) line 233
> […]
>
> Attached is an archive of the /var/log/riak directory. The logs there are
> for the latest starting attempt. Riak core dumped in a minute or two after
> being started.
> Is there a way to fix the merge index corruption and start the node?
>
> thank you for your efforts,
> Deyan
>
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: riak core dumped, merge_index corruption?

Reply via email to