Deyan, What Riak version are you running? There was a corruption issue discovered and fixed in the 1.4.0 release.
https://github.com/basho/riak/blob/riak-1.4.0/RELEASE-NOTES.md#issues--prs-resolved https://github.com/basho/merge_index/pull/30 As for fixing, you'll want to delete the buffer files for the partitions which are having issues. E.g. if you look in crash.log you'll see partition numbers for the crashing vnodes. > ** Data == {state,685078892498860742907977265335757665463718379520,riak_search_vnode,undefined,undefined,none,undefined,undefined,undefined,undefined,0} In the /storage/riak/merge_index/685078892498860742907977265335757665463718379520 you'll see buffer files. You'll want to delete those. After deleting all these bad buffers Riak Search should start fine. You'll then want to upgrade to 1.4.1 to avoid corruption in the future. Finally, since you have to delete the buffers you'll have missing indexes and you'll want to re-index your data. Since only one of your nodes experience corruption you can use the built-in repair functionality to re-index only data for those partitions. First you'll want to attach to one of your nodes. Then for each partition run the following. > riak_search_vnode:repair(P) Make sure to run repair for only one partition at a time to avoid overloading anything. To determine when a repair is finished you can periodically call the following. Once it returns 'no_repair' that indicates it has finished. > riak_search_vnode:repair_status(P) Here is more information on the repair command. http://docs.basho.com/riak/latest/cookbooks/Repairing-Search-Indexes/ -Z On Tue, Aug 6, 2013 at 5:17 AM, Deyan Dyankov <[email protected]> wrote: > hi, > > we have a 3 node cluster and one of the node crashed yesterday. > Nodes are db1, db2 and db3. We started other services on db1 and db2 and > db1 crashed. Currently db2 and db3 are fine, balanced, receiving writes and > serving reads. > However, db1 has issues starting. When I start the node, it outputs > numerous errors and this finally results in a core dump. We use Riak search > and this may be the reason for the dump. After starting the node, these are > the first errors that are seen in the log file: > > […] > 2013-08-06 11:06:08.989 [info] <0.7.0> Application erlydtl started on node > '[email protected]' > *2013-08-06 11:06:16.675 [warning] <0.5010.0> Corrupted posting detected > in > /storage/riak/merge_index/456719261665907161938651510223838443642478919680/buffer.598 > after reading 2281* > *49 bytes, ignoring remainder.* > 2013-08-06 11:06:18.922 [error] <0.5310.0> CRASH REPORT Process <0.5310.0> > with 0 neighbours exited with reason: bad argument in call to > erlang:binary_to_term(<<131,108,0,0,0,1,10 > 4,4,104,3,109,0,0,0,25,99,120,108,101,118,101,110,116,115,95,99,97,107,101,...>>) > in mi_buffer:read_value/2 line 162 in gen_server:init_it/6 line 328 > 2013-08-06 11:06:20.751 [error] <0.5309.0> gen_fsm <0.5309.0> in state > started terminated with reason: no function clause matching > riak_search_vnode:terminate({{badmatch,{error,{b > adarg,[{erlang,binary_to_term,[<<131,108,0,0,0,1,104,4,104,3,109,0,0,0,25,...>>],...},...]}}},...}, > undefined) line 233 > […] > > Attached is an archive of the /var/log/riak directory. The logs there are > for the latest starting attempt. Riak core dumped in a minute or two after > being started. > Is there a way to fix the merge index corruption and start the node? > > thank you for your efforts, > Deyan > > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
