Luke,
I understand. I was responding to the capacity planning bit; the
cluster had more than enough capacity for day to day operations, but not
nearly enough to survive a node retiring.
I view it a bit differently from SQL Server and other database products
since those are single-machine solutions. The clustering products
available for them do warn you if you attempt to retire a node from a
cluster and don't have the resources for the other nodes to take over,
or prevent it if it's a hard resource limit like disk space.
Since my cluster was virtualized, I'm almost 'out of the woods'. The
retiring node is down to 6% of the cluster data and should be done some
time tonight. After that, hopefully, I can add some smaller nodes to
the cluster and retire the three that are now far larger than I'd like
them to be, disk-wise.
Thanks!
On 4/15/2014 12:59, Luke Bakken wrote:
Hi Allen -
I hope you don't take my response as an assignment of blame. I created
the docs issue specifically because this case is not clear nor would I
expect Riak users to "just know" that your situation could happen when
they use "cluster leave" to remove Riak nodes from a cluster.
Every software system makes different decisions about protecting users
from potentially disruptive actions and none protect from all possible
failure scenarios. SQL Server does not protect you from inserting so
much data you fill a disk, for instance.
I'll also follow up with the Product team to discuss more insight into
the outcome of various "riak-admin cluster" operations.
--
Luke Bakken
CSE
[email protected] <mailto:[email protected]>
On Tue, Apr 15, 2014 at 9:44 AM, Allen Landsidel
<[email protected] <mailto:[email protected]>> wrote:
I realize I made a mistake, it would just be nice if the UI could
warn me that I was about to do so, especially given the consequences.
If it was simply showing me how much space each node was using
(forget a percentage or anything) that would've been enough to avert
disaster. With four nodes, if they're over 25% capacity (far lower
than any sensible warning level in a monitoring system), the cluster
leave is going to fail. The more nodes you add to the system, the
lower you'd have to set that warning threshold to alert you that
you're in a state where you can't safely retire a node.
On 4/15/2014 12:40, Luke Bakken wrote:
Hi Allen -
Failure / node leave situations should be taken into account during
cluster capacity planning. I've created an issue to more thoroughly
explain this in our documentation:
https://github.com/basho/__basho_docs/issues/1034
<https://github.com/basho/basho_docs/issues/1034>
--
Luke Bakken
CSE
[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>
On Tue, Apr 15, 2014 at 9:28 AM, Allen Landsidel
<[email protected] <mailto:[email protected]>
<mailto:landsidel.allen@gmail.__com
<mailto:[email protected]>>> wrote:
Luke,
I already do use nagios for that, but the disk space was
fine before
I told one of the nodes to leave the cluster. That's my
problem --
there was not enough free space in the cluster for it to
move all
that nodes data. It accepted the leave and then ran me out
of disk
space on all the other nodes, with no way to abort or recover.
My only option was to add more space to the other nodes (as you
said, adding new nodes will not work until the leave is
done), which
is easy enough in a virtualized environment but requires
downtime.
In a bare metal environment, it could be catastrophic to
the cluster.
On 4/15/2014 12:19, Luke Bakken wrote:
Hi Allen,
Cluster leave does not check for disk space and in
general, Riak
is not
aware of how much space it has available to itself
(most db systems
don't monitor disk space I think). I'll send a note to
product
management about this. We recommend using a monitoring
solution
(like
collectd + graphite) to keep an eye on available disk
space.
--
Luke Bakken
CSE
[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>
<mailto:[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com