Hi Grant,
Sorry for taking so taking so long getting back to you I had to go
overseas at short notice.
Excerpts from Grant Schofield's message of 2010-09-18 02:13:04 +1000:
> There looks like there are some strange things happening in the logs with the
> ring as well as timeouts. I would be curious how long the process takes to
> die if you were just to run a stop instead of a restart. Did you at one time
> change the name of this node?
It takes about 4.5 seconds (wall time).
I haven't changed the name of the node and the app.config is as per the
Debian package.
I should also that there is only one node.
> One thing that might be interesting to try is to stop the server, make a copy
> of your data directory, remove all the data in the data directory, and try to
> start and stop the node and see if it works more reliably.
I tried removing all the data from the data directory as you suggested
and for a while it worked but the problem has started again.
The bitcask directory is 1.5G in size and the ring directory is 24K. The
bucket properties are:
{
"props": {
"name": "uris",
"n_val": 3,
"allow_mult": false,
"last_write_wins": false,
"precommit": [
],
"postcommit": [
],
"chash_keyfun": {
"mod": "riak_core_util",
"fun": "chash_std_keyfun"
},
"linkfun": {
"mod": "riak_kv_wm_link_walker",
"fun": "mapreduce_linkfun"
},
"old_vclock": 86400,
"young_vclock": 20,
"big_vclock": 50,
"small_vclock": 10,
"r": "quorum",
"w": "quorum",
"dw": "quorum",
"rw": "quorum"
}
}
I also tried to get the properties with the keys=true option but after
10 minutes of no activity (the 1, 5 and 10 minute load averages were
all zero) I killed the process. The only indication of any activity was
the following log message every minute:
ERROR REPORT==== 29-Oct-2010::04:19:43 ===
** Generic server riak_kv_vnode_master terminating
** Last message in was {'$gen_cast',
{riak_vnode_req_v1,
1141798154164767904846628775559596109106197299200,
ignore,
{riak_kv_listkeys_req_v2,<<"uris">>,92166134,
<0.2635.0>}}}
** When Server state == {state,679956,[],undefined,riak_kv_vnode,
riak_kv_legacy_vnode}
** Reason for termination ==
** {{badmatch,{error,{{badmatch,{error,emfile}},
[{bitcask,scan_key_files,3},
{bitcask,open,2},
{riak_kv_bitcask_backend,start,2},
{riak_kv_vnode,init,1},
{riak_core_vnode,init,1},
{gen_fsm,init_it,6},
{proc_lib,init_p_do_apply,3}]}}},
[{riak_core_vnode_master,get_vnode,2},
{riak_core_vnode_master,handle_cast,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
The logs can be found here:
http://stuff.roughage.com.au/riak-failure-3.log.tar.gz
rgh
> Grant Schofield
> Developer Advocate
> Basho Technologies
>
> On Sep 16, 2010, at 4:53 PM, Richard Heycock wrote:
>
> > Over the last few weeks I've been finding it harder and harder to start
> > riak which given that it's running on an auto-provisioned ec2 instance is
> > a bit of an issue! I can generally restart it by running
> > /etc/init.d/riak restart but it's got to the stage where I have to run
> > it four or five times. I should clarify here that when I say "harder to
> > start" it does start but as soon as I try to do anything it fails.
> >
> > The contents of /var/log/riak are here:
> >
> > http://stuff.roughage.com.au/riak-failure-2.log.tar.gz
> >
> > rgh
> > --
> > Richard Heycock
> >
> > http://topikality.com
> >
> > +61 (0) 410 646 369
> > [e]: [email protected]
> > [im]: [email protected]
> >
> > _______________________________________________
> > riak-users mailing list
> > [email protected]
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
--
Richard Heycock
http://topikality.com
+61 (0) 410 646 369
[e]: [email protected]
[im]: [email protected]
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com