I am wondering if anyone has experienced the following, which
just occurred to me.

The set up is a 3 node cluster, multibackend, actually storing data into
eleveldb.  One bucket and search enabled with a custom schema.  The cluster
is lightly loaded.  Load is almost all writes, with a handful of searches
and reads.

This morning, while performing a query against one of the nodes, Riak
crashes.  I then notice there is no disk space left.  Upon investigating, I
see that all disk space is being consumed by the eleveldb backend.  All 45GB
of of.  Looking at the other two nodes I see that they are respectively at
1% and 4% disk usage.

Anyone seen that before?

A few errors before the crash, but nothing that would seem to be out of
line:

2011-10-23 02:14:22.794 [error] <0.27137.199> gen_fsm <0.27137.199> in state
wait_pipeline_shutdown terminated with reason: {sink_died,killed}
2011-10-23 02:14:22.795 [error] <0.27137.199> CRASH REPORT Process
<0.27137.199> with 1 neighbours crashed with reason: {sink_died,killed}
2011-10-23 02:14:22.797 [error] <0.139.0> Supervisor riak_pipe_builder_sup
had child undefined started with {riak_pipe_builder,start_link,undefined} at
<0.27137.199> exit with reason {sink_died,killed} in context
child_terminated
2011-10-23 02:14:22.798 [error] <0.140.0> Supervisor riak_pipe_fitting_sup
had child undefined started with {riak_pipe_fitting,start_link,undefined} at
<0.27138.199> exit with reason {sink_died,killed} in context
child_terminated
2011-10-23 02:14:22.803 [error] <0.269.0> Supervisor
riak_pipe_vnode_worker_sup had child undefined started with
{riak_pipe_vnode_worker,start_link,undefined} at <0.27141.199> exit with
reason fitting_died in context child_terminated
2011-10-23 02:14:22.804 [error] <0.271.0> Supervisor
riak_pipe_vnode_worker_sup had child undefined started with
{riak_pipe_vnode_worker,start_link,undefined} at <0.27140.199> exit with
reason fitting_died in context child_terminated
2011-10-24 03:40:04.352 [error] <0.18079.4> CRASH REPORT Process <0.18079.4>
with 0 neighbours crashed with reason:
{error,{badmatch,{error,{file_error,2011-10-24 20:29:09.461 [info] <0.7.0>
Application lager started on node '[email protected]'

As the disk filled up, there wasn't anything else logged.


The one thing that comes to mind is that earlier in the day I had cleared
the bucket using a MR job that deleted objects.  I think about 2 million
objects where deleted.
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to