Hi Ryan,
sorry for the late reply. We are running 1.3.2 and after deleting the buffer.*
files, riak started. However, running the riak_search_vnode:repair( P )
resulted in plenty of errors like:
[…]
2013-08-08 16:36:08.261 [error] <0.1938.0> Supervisor mi_buffer_converter_sup
had child ignored started with mi_buffer_converter:start_link(<0.8157.0>,
"/storage/riak/merge_index/1050454301831586472458898473514828420377701515264",
{buffer,"/storage/riak/merge_index/1050454301831586472458898473514828420377701515264/buffer.12",...})
at <0.30960.49> exit with reason
{mi_segment,segment_already_exists,"/storage/riak/merge_index/1050454301831586472458898473514828420377701515264/segment.12"}
in context child_terminated
[…]
As our Riak Search app isn't yet critical, I decided to do a rolling restart
with rm -rf /storage/riak/merge_index/* on each node. The KV (leveldb) + MR
functionality was working and new keys were properly indexed by Riak search.
All I had to do was rewrite old keys in Riak search in order to reindex them.
So everything is working now.
I am now performing a migration to Riak 1.4.1. My initial idea was to use
yokozuna but I noticed that it's bundled for 1.4.0 only and I'd rather have the
fixes that come with 1.4.1. So I guess that we'll have to wait a bit more for
it.
Thanks for the help!
regards,
Deyan
On Aug 6, 2013, at 11:49 PM, Ryan Zezeski <[email protected]> wrote:
> Deyan,
>
> What Riak version are you running? There was a corruption issue discovered
> and fixed in the 1.4.0 release.
>
> https://github.com/basho/riak/blob/riak-1.4.0/RELEASE-NOTES.md#issues--prs-resolved
> https://github.com/basho/merge_index/pull/30
>
> As for fixing, you'll want to delete the buffer files for the partitions
> which are having issues. E.g. if you look in crash.log you'll see partition
> numbers for the crashing vnodes.
>
> > ** Data ==
> > {state,685078892498860742907977265335757665463718379520,riak_search_vnode,undefined,undefined,none,undefined,undefined,undefined,undefined,0}
>
> In the
> /storage/riak/merge_index/685078892498860742907977265335757665463718379520
> you'll see buffer files. You'll want to delete those. After deleting all
> these bad buffers Riak Search should start fine. You'll then want to upgrade
> to 1.4.1 to avoid corruption in the future. Finally, since you have to
> delete the buffers you'll have missing indexes and you'll want to re-index
> your data.
>
> Since only one of your nodes experience corruption you can use the built-in
> repair functionality to re-index only data for those partitions. First
> you'll want to attach to one of your nodes. Then for each partition run the
> following.
>
> > riak_search_vnode:repair(P)
>
> Make sure to run repair for only one partition at a time to avoid overloading
> anything.
>
> To determine when a repair is finished you can periodically call the
> following. Once it returns 'no_repair' that indicates it has finished.
>
> > riak_search_vnode:repair_status(P)
>
> Here is more information on the repair command.
>
> http://docs.basho.com/riak/latest/cookbooks/Repairing-Search-Indexes/
>
>
> -Z
>
>
>
> On Tue, Aug 6, 2013 at 5:17 AM, Deyan Dyankov <[email protected]> wrote:
> hi,
>
> we have a 3 node cluster and one of the node crashed yesterday.
> Nodes are db1, db2 and db3. We started other services on db1 and db2 and db1
> crashed. Currently db2 and db3 are fine, balanced, receiving writes and
> serving reads.
> However, db1 has issues starting. When I start the node, it outputs numerous
> errors and this finally results in a core dump. We use Riak search and this
> may be the reason for the dump. After starting the node, these are the first
> errors that are seen in the log file:
>
> […]
> 2013-08-06 11:06:08.989 [info] <0.7.0> Application erlydtl started on node
> '[email protected]'
> 2013-08-06 11:06:16.675 [warning] <0.5010.0> Corrupted posting detected in
> /storage/riak/merge_index/456719261665907161938651510223838443642478919680/buffer.598
> after reading 2281
> 49 bytes, ignoring remainder.
> 2013-08-06 11:06:18.922 [error] <0.5310.0> CRASH REPORT Process <0.5310.0>
> with 0 neighbours exited with reason: bad argument in call to
> erlang:binary_to_term(<<131,108,0,0,0,1,10
> 4,4,104,3,109,0,0,0,25,99,120,108,101,118,101,110,116,115,95,99,97,107,101,...>>)
> in mi_buffer:read_value/2 line 162 in gen_server:init_it/6 line 328
> 2013-08-06 11:06:20.751 [error] <0.5309.0> gen_fsm <0.5309.0> in state
> started terminated with reason: no function clause matching
> riak_search_vnode:terminate({{badmatch,{error,{b
> adarg,[{erlang,binary_to_term,[<<131,108,0,0,0,1,104,4,104,3,109,0,0,0,25,...>>],...},...]}}},...},
> undefined) line 233
> […]
>
> Attached is an archive of the /var/log/riak directory. The logs there are for
> the latest starting attempt. Riak core dumped in a minute or two after being
> started.
> Is there a way to fix the merge index corruption and start the node?
>
> thank you for your efforts,
> Deyan
>
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com