Luke,
I already do use nagios for that, but the disk space was fine before I
told one of the nodes to leave the cluster. That's my problem -- there
was not enough free space in the cluster for it to move all that nodes
data. It accepted the leave and then ran me out of disk space on all
the other nodes, with no way to abort or recover.
My only option was to add more space to the other nodes (as you said,
adding new nodes will not work until the leave is done), which is easy
enough in a virtualized environment but requires downtime. In a bare
metal environment, it could be catastrophic to the cluster.
On 4/15/2014 12:19, Luke Bakken wrote:
Hi Allen,
Cluster leave does not check for disk space and in general, Riak is not
aware of how much space it has available to itself (most db systems
don't monitor disk space I think). I'll send a note to product
management about this. We recommend using a monitoring solution (like
collectd + graphite) to keep an eye on available disk space.
--
Luke Bakken
CSE
[email protected] <mailto:[email protected]>
On Mon, Apr 14, 2014 at 10:12 AM, Allen Landsidel
<[email protected] <mailto:[email protected]>> wrote:
Luke,
As I said in the private email, I ended up doing just that. The
cluster is virtualized (I am aware of the potential performance
issues) so I just shut it all down, grew the drive allocated to
riak's data dir, and brought them back up. The extra space (or
something?) caused them to start going heavily into swap, killing
performance, so I shut down again and gave them more memory.
For now though the cluster remains off. While it was on, our SAN
performance was getting murdered. I'm having problems with one of
the arrays and I'm dealing with that right now; when it's fixed, I
can go back to figuring out how to fix the issue with the riak
cluster. I don't know right now if it was riak or the array issues
that killed the SAN performance.
I do have a few more questions though.
1. Is the cluster leave supposed to check that the remaining nodes
in the cluster have enough space to move all the data to? If not,
that's something that would be nice to have in a future version.
2. Can I tell it through the config files which filesystem(s) to
check for available space? Being FreeBSD, I have the normal mounts
(/, /usr, /var, /tmp) as well as one dedicated to riak data. If
it's just checking the space on the server as a whole, it will get a
false sense of how much space is available for it.
On 4/14/2014 12:28, Luke Bakken wrote:
Hi Allen,
There's no way to abort a cluster operation that is in progress. In
addition, data won't transfer to the node you added until the
previous
cluster transition completes.
Is it possible to add disk space to your three running nodes?
--
Luke Bakken
CSE
[email protected] <mailto:[email protected]>
On Fri, Apr 11, 2014 at 4:48 AM, Allen Landsidel
<[email protected] <mailto:[email protected]>>
wrote:
I have a 5-node cluster (riak 1.4.0, freebsd9) that is being
used in
production and miscalculated the disk space being used by
the cluster as a
whole. Yesterday I told the cluster to remove two nodes,
leaving just
three, but I need four active to cover the usage.
One node left successfully before I became aware of the
problem, and disk
filled up completely on the other three. I added the one
that left back to
the cluster, but data is not being moved to it.
Is there any way to 'abort' the cluster leave issued to the
node that is
still trying to leave, or some other way to straighten this
out without
losing (much) data?
_________________________________________________
riak-users mailing list
[email protected] <mailto:[email protected]>
http://lists.basho.com/__mailman/listinfo/riak-users___lists.basho.com
<http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com