Thanks for the heads-up sounds simple enough to add to the Riak startup script in /etc/init.d
On Fri, Nov 1, 2013 at 10:30 AM, Evan Vigil-McClanahan <[email protected]> wrote: > I replied to this (accidentally) off-list. For the rest of you folks, > if you want to follow the resolution you can add yourself to the > following issue: > https://github.com/basho/bitcask/issues/99 > > In the mean time, a workaround is to remove all lockfiles > (bitcask.*.lock) from your bitcask directory before you start up. I > hope to have an actual fix in Riak 2.0. > > On Thu, Oct 31, 2013 at 7:42 AM, Konstantin Kalin > <[email protected]> wrote: >> We use riak 1.2.1 in production. Recently we had two incidents when a Riak >> node (out of 27 nodes) was in limbo state: riak-core worked fine but bitcask >> backend was constantly crashing (see below). Both times it happened >> immediately after Riak was restarted due to Linux reboot. >> >> I looked at the source code of bitcask and looks like it happens because >> there was a stuck alive process that held the lock. >> >> Major issue that this node causes timeouts on others and as result Java >> client got timeout error and bad-message code. Thus it was impossible to do >> any update for data that lands to that node as a primary replica (It's my >> guess). For reads there was significant delay ~60 seconds. We have W=3, R=2. >> >> Is it known issue ? What I can do besides of upgrading to latest version and >> hoping it will solve the issue :) >> >> Thank you, >> Konstantin. >> >> Bad Riak node. >> ================== >> 2013-10-30 21:40:27.000 [info] <0.7.0> Application erlydtl started on node >> '[email protected]' >> 2013-10-30 21:40:45.229 [info] <0.407.0>@riak_core:wait_for_service:439 Wait >> complete for service riak_kv (18 seconds) >> 2013-10-30 21:40:45.473 [error] <0.798.0> gen_fsm <0.798.0> in state active >> terminated with reason: bad return value: >> {error,{write_locked,locked,"/var/lib/riak/bitcask/1324485858831130769622089379649131486563188867072"}} >> 2013-10-30 21:40:45.485 [error] <0.798.0> CRASH REPORT Process <0.798.0> >> with 1 neighbours exited with reason: bad return value: >> {error,{write_locked,locked,"/var/lib/riak/bitcask/1324485858831130769622089379649131486563188867072"}} >> in gen_fsm:terminate/7 line 611 >> 2013-10-30 21:40:45.486 [error] <0.140.0> Supervisor riak_core_vnode_sup had >> child undefined started with {riak_core_vnode,start_link,undefined} at >> <0.798.0> exit with reason bad return value: >> {error,{write_locked,locked,"/var/lib/riak/bitcask/1324485858831130769622089379649131486563188867072"}} >> in context child_terminated >> 2013-10-30 21:40:45.487 [error] <0.845.0> gen_fsm <0.845.0> in state ready >> terminated with reason: bad return value: >> {error,{write_locked,locked,"/var/lib/riak/bitcask/1324485858831130769622089379649131486563188867072"}} >> ================== >> >> >> Another good Riak node: >> ================== >> 2013-10-30 21:37:36.348 [error] <0.15018.5661>@riak_kv_put_fsm:prepare:220 >> Unable to forward put for {<<"userSessions">>,<<"123">>} to >> '[email protected]' - nodedown >> 2013-10-30 21:37:36.348 [error] <0.14950.5661>@riak_kv_put_fsm:prepare:220 >> Unable to forward put for {<<"userSessions">>,<<"123">>} to >> '[email protected]' - nodedown >> 2013-10-30 21:41:52.313 [error] >> <0.16728.5661>@riak_api_pb_server:handle_info:170 Unrecognized message >> {11673037,{error,timeout}} >> 2013-10-30 21:41:52.313 [error] >> <0.16733.5661>@riak_api_pb_server:handle_info:170 Unrecognized message >> {105876903,{error,timeout}} >> ================== >> >> >> One more good Riak node (that temporary held vnode of bad Riak node during >> it's down): >> ================== >> 2013-10-30 21:37:37.345 [error] <0.3663.0> ** Node >> '[email protected]' not responding ** >> ** Removing (timedout) connection ** >> >> 2013-10-30 21:37:37.345 [error] <0.30864.17>@riak_kv_put_fsm:prepare:220 >> Unable to forward put for {<<"userSessions">>,<<"5457016">>} to >> '[email protected]' - nodedown >> 2013-10-30 21:41:44.358 [info] >> <0.151.0>@riak_core_handoff_manager:handle_info:271 An outbound handoff of >> partition riak_kv_vnode 1438665674247607560106752257205091097473808596992 >> was terminated for reason: {shutdown,max_concurrency} >> 2013-10-30 21:42:44.361 [info] >> <0.151.0>@riak_core_handoff_manager:handle_info:271 An outbound handoff of >> partition riak_kv_vnode 1438665674247607560106752257205091097473808596992 >> was terminated for reason: {shutdown,max_concurrency} >> 2013-10-30 21:44:30.164 [info] >> <0.151.0>@riak_core_handoff_manager:handle_info:271 An outbound handoff of >> partition riak_kv_vnode 1438665674247607560106752257205091097473808596992 >> was terminated for reason: {shutdown,max_concurrency} >> 2013-10-30 21:45:30.384 [info] >> <0.2262.18>@riak_core_handoff_sender:start_fold:126 Starting hinted_handoff >> transfer of riak_kv_vnode from '[email protected]' >> 1438665674247607560106752257205091097473808596992 to >> '[email protected]' >> 1438665674247607560106752257205091097473808596992 >> 2013-10-30 21:45:43.917 [error] >> <0.2262.18>@riak_core_handoff_sender:start_fold:215 hinted_handoff transfer >> of riak_kv_vnode from '[email protected]' >> 1438665674247607560106752257205091097473808596992 to >> '[email protected]' >> 1438665674247607560106752257205091097473808596992 failed because of >> error:{case_clause,{error,closed}} >> [{riak_core_handoff_sender,start_fold,5,[{file,"src/riak_core_handoff_sender.erl"},{line,174}]}] >> 2013-10-30 21:46:30.382 [error] >> <0.969.18>@riak_api_pb_server:handle_info:170 Unrecognized message >> {67827933,{error,timeout}} >> 2013-10-30 21:47:30.167 [info] >> <0.151.0>@riak_core_handoff_manager:handle_info:271 An outbound handoff of >> partition riak_kv_vnode 1438665674247607560106752257205091097473808596992 >> was terminated for reason: {shutdown,max_concurrency} >> >> >> _______________________________________________ >> riak-users mailing list >> [email protected] >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
