Re: reversing node removal?

Allen Landsidel Tue, 15 Apr 2014 09:30:12 -0700

Luke,

I already do use nagios for that, but the disk space was fine before Itold one of the nodes to leave the cluster. That's my problem -- therewas not enough free space in the cluster for it to move all that nodesdata. It accepted the leave and then ran me out of disk space on allthe other nodes, with no way to abort or recover.

My only option was to add more space to the other nodes (as you said,adding new nodes will not work until the leave is done), which is easyenough in a virtualized environment but requires downtime. In a baremetal environment, it could be catastrophic to the cluster.


On 4/15/2014 12:19, Luke Bakken wrote:

Hi Allen,

Cluster leave does not check for disk space and in general, Riak is not
aware of how much space it has available to itself (most db systems
don't monitor disk space I think). I'll send a note to product
management about this. We recommend using a monitoring solution (like
collectd + graphite) to keep an eye on available disk space.


--
Luke Bakken
CSE
[email protected] <mailto:[email protected]>


On Mon, Apr 14, 2014 at 10:12 AM, Allen Landsidel
<[email protected] <mailto:[email protected]>> wrote:

    Luke,

    As I said in the private email, I ended up doing just that.  The
    cluster is virtualized (I am aware of the potential performance
    issues) so I just shut it all down, grew the drive allocated to
    riak's data dir, and brought them back up.  The extra space (or
    something?) caused them to start going heavily into swap, killing
    performance, so I shut down again and gave them more memory.

    For now though the cluster remains off.  While it was on, our SAN
    performance was getting murdered.  I'm having problems with one of
    the arrays and I'm dealing with that right now; when it's fixed, I
    can go back to figuring out how to fix the issue with the riak
    cluster.  I don't know right now if it was riak or the array issues
    that killed the SAN performance.

    I do have a few more questions though.

    1. Is the cluster leave supposed to check that the remaining nodes
    in the cluster have enough space to move all the data to?  If not,
    that's something that would be nice to have in a future version.

    2. Can I tell it through the config files which filesystem(s) to
    check for available space?  Being FreeBSD, I have the normal mounts
    (/, /usr, /var, /tmp) as well as one dedicated to riak data.  If
    it's just checking the space on the server as a whole, it will get a
    false sense of how much space is available for it.


    On 4/14/2014 12:28, Luke Bakken wrote:

        Hi Allen,

        There's no way to abort a cluster operation that is in progress. In
        addition, data won't transfer to the node you added until the
        previous
        cluster transition completes.

        Is it possible to add disk space to your three running nodes?
        --
        Luke Bakken
        CSE
        [email protected] <mailto:[email protected]>


        On Fri, Apr 11, 2014 at 4:48 AM, Allen Landsidel
        <[email protected] <mailto:[email protected]>>
        wrote:

            I have a 5-node cluster (riak 1.4.0, freebsd9) that is being
            used in
            production and miscalculated the disk space being used by
            the cluster as a
            whole.  Yesterday I told the cluster to remove two nodes,
            leaving just
            three, but I need four active to cover the usage.

            One node left successfully before I became aware of the
            problem, and disk
            filled up completely on the other three.  I added the one
            that left back to
            the cluster, but data is not being moved to it.

            Is there any way to 'abort' the cluster leave issued to the
            node that is
            still trying to leave, or some other way to straighten this
            out without
            losing (much) data?

            _________________________________________________
            riak-users mailing list
            [email protected] <mailto:[email protected]>
            
http://lists.basho.com/__mailman/listinfo/riak-users___lists.basho.com
            <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: reversing node removal?

Reply via email to