Hi, Justin. Unfortunately, the application should behave like that, though I don't like it myself. During any outage, we should be completely sure that all the vnodes are up, and we cannot use any kind of replication in our case. It is just the way to ensure that we don't access the same resource at the same time from different nodes, while being able to access any resource, even when some nodes are down.
And in case of any partitions, that are much less likely to happen, we'll just shut down all the lesser partitions, while only the largest one will remain running, until the outage is resolved manually. By master node, I mean the one that is used when we are joining new nodes using riak-admin (as far as I remember, only one node can be used for this). I believe, it's the one that is is returned by riak_core_ring:owner_node/1. Maybe, I'm wrong and when we call riak_core_gossip:send_ring(RingNode, node()), we can use any node from the cluster as RingNode? Thank you. On Thu, Jul 28, 2011 at 5:20 PM, Justin Sheehy <[email protected]> wrote: > Hi, Dmitry. > > A couple of suggestions... > > The reason that you're not seeing an easy way to automatically have nodes be > added or removed from the cluster upon going down or coming up is that we > recommend strongly against such behavior. > > The idea is that intentional (administrative) outages are very different in > nature from unintentional and potentially transitory outages. We have > explicit administrative commands such as "join" and "leave" for the > administrative cases, making it very easy to add or remove hosts to a > cluster. When a node is unreachable, you often can't automatically tell > whether it is a host problem or a network problem and can't automatically > tell if it is a long-term or short-term outage. This is why mechanisms such > as quorums and hinted handoff exist: to ensure proper operation of the > cluster as a whole throughout such outages. Consider the case where you have > a network problem such that several of your nodes lose visibility to each > other for brief and distinct periods of time. If nodes are auto-added and > auto-removed then you will have quite a bit of churn and potentially a very > harmful feedback scenario. Instead of auto-adding and auto-removing, consider > using things like > riak_core_node_watcher to decide which nodes to interact with on a > per-operation basis. > > I'm also not sure what you mean by "if the master node goes down" since in > most riak_core applications there is no master node. Of course you can create > such a mechanism if you need it, but (e.g.) Riak KV and the accompanying > applications do not have any notion of a master node and thus do not have any > such concern. > > I hope that this is useful. > > Best regards, > > -Justin > > > -- Best regards, Dmitry Demeshchuk _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
