Re: riak core dumped, merge_index corruption?

Deyan Dyankov Mon, 12 Aug 2013 05:49:55 -0700

Hi Ryan,

sorry for the late reply. We are running 1.3.2 and after deleting the buffer.* 
files, riak started. However, running the riak_search_vnode:repair( P ) 
resulted in plenty of errors like:
[…]
2013-08-08 16:36:08.261 [error] <0.1938.0> Supervisor mi_buffer_converter_sup 
had child ignored started with mi_buffer_converter:start_link(<0.8157.0>, 
"/storage/riak/merge_index/1050454301831586472458898473514828420377701515264", 
{buffer,"/storage/riak/merge_index/1050454301831586472458898473514828420377701515264/buffer.12",...})
 at <0.30960.49> exit with reason 
{mi_segment,segment_already_exists,"/storage/riak/merge_index/1050454301831586472458898473514828420377701515264/segment.12"}
 in context child_terminated
[…]


As our Riak Search app isn't yet critical, I decided to do a rolling restart 
with rm -rf /storage/riak/merge_index/* on each node. The KV (leveldb) + MR 
functionality was working and new keys were properly indexed by Riak search. 
All I had to do was rewrite old keys in Riak search in order to reindex them. 
So everything is working now.

I am now performing a migration to Riak 1.4.1. My initial idea was to use 
yokozuna but I noticed that it's bundled for 1.4.0 only and I'd rather have the 
fixes that come with 1.4.1. So I guess that we'll have to wait a bit more for 
it.

Thanks for the help!

regards,
Deyan

On Aug 6, 2013, at 11:49 PM, Ryan Zezeski <[email protected]> wrote:

> Deyan,
> 
> What Riak version are you running?  There was a corruption issue discovered 
> and fixed in the 1.4.0 release.
> 
> https://github.com/basho/riak/blob/riak-1.4.0/RELEASE-NOTES.md#issues--prs-resolved
> https://github.com/basho/merge_index/pull/30
> 
> As for fixing, you'll want to delete the buffer files for the partitions 
> which are having issues.  E.g. if you look in crash.log you'll see partition 
> numbers for the crashing vnodes.
> 
> > **      Data  == 
> > {state,685078892498860742907977265335757665463718379520,riak_search_vnode,undefined,undefined,none,undefined,undefined,undefined,undefined,0}
> 
> In the 
> /storage/riak/merge_index/685078892498860742907977265335757665463718379520 
> you'll see buffer files.  You'll want to delete those.  After deleting all 
> these bad buffers Riak Search should start fine.  You'll then want to upgrade 
> to 1.4.1 to avoid corruption in the future.  Finally, since you have to 
> delete the buffers you'll have missing indexes and you'll want to re-index 
> your data.
> 
> Since only one of your nodes experience corruption you can use the built-in 
> repair functionality to re-index only data for those partitions.  First 
> you'll want to attach to one of your nodes.  Then for each partition run the 
> following.
> 
> > riak_search_vnode:repair(P)
> 
> Make sure to run repair for only one partition at a time to avoid overloading 
> anything.
> 
> To determine when a repair is finished you can periodically call the 
> following.  Once it returns 'no_repair' that indicates it has finished.
> 
> > riak_search_vnode:repair_status(P)
> 
> Here is more information on the repair command.
> 
> http://docs.basho.com/riak/latest/cookbooks/Repairing-Search-Indexes/
> 
> 
> -Z
> 
> 
> 
> On Tue, Aug 6, 2013 at 5:17 AM, Deyan Dyankov <[email protected]> wrote:
> hi,
> 
> we have a 3 node cluster and one of the node crashed yesterday.
> Nodes are db1, db2 and db3. We started other services on db1 and db2 and db1 
> crashed. Currently db2 and db3 are fine, balanced, receiving writes and 
> serving reads.
> However, db1 has issues starting. When I start the node, it outputs numerous 
> errors and this finally results in a core dump. We use Riak search and this 
> may be the reason for the dump. After starting the node, these are the first 
> errors that are seen in the log file:
> 
> […]
> 2013-08-06 11:06:08.989 [info] <0.7.0> Application erlydtl started on node 
> '[email protected]'
> 2013-08-06 11:06:16.675 [warning] <0.5010.0> Corrupted posting detected in 
> /storage/riak/merge_index/456719261665907161938651510223838443642478919680/buffer.598
>  after reading 2281
> 49 bytes, ignoring remainder.
> 2013-08-06 11:06:18.922 [error] <0.5310.0> CRASH REPORT Process <0.5310.0> 
> with 0 neighbours exited with reason: bad argument in call to 
> erlang:binary_to_term(<<131,108,0,0,0,1,10
> 4,4,104,3,109,0,0,0,25,99,120,108,101,118,101,110,116,115,95,99,97,107,101,...>>)
>  in mi_buffer:read_value/2 line 162 in gen_server:init_it/6 line 328
> 2013-08-06 11:06:20.751 [error] <0.5309.0> gen_fsm <0.5309.0> in state 
> started terminated with reason: no function clause matching 
> riak_search_vnode:terminate({{badmatch,{error,{b
> adarg,[{erlang,binary_to_term,[<<131,108,0,0,0,1,104,4,104,3,109,0,0,0,25,...>>],...},...]}}},...},
>  undefined) line 233
> […]
> 
> Attached is an archive of the /var/log/riak directory. The logs there are for 
> the latest starting attempt. Riak core dumped in a minute or two after being 
> started.
> Is there a way to fix the merge index corruption and start the node?
> 
> thank you for your efforts,
> Deyan
> 
> 
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: riak core dumped, merge_index corruption?

Reply via email to