Kevin, On Thu, Mar 14, 2013 at 1:56 PM, Kevin Burton <[email protected]> wrote: > So that is what I am missing. If each vnode keeps an entire copy of my data > and I have 4 physical node then there are 16 vnodes per physical node. That > would mean I have the data replicated 16 times per physical node. 10 GB > turns into 160GB etc. Right? So won’t I run out of disk space? >
Your raw data set is replicated 3 times by default. Three different vnodes of your total (by default 64) will be responsible for each replica. So, 10GB raw = 30GB replicated. Mark > > > From: Alexander Sicular [mailto:[email protected]] > Sent: Thursday, March 14, 2013 3:51 PM > > > To: Kevin Burton > Cc: [email protected] > Subject: Re: Bigger data than disk space? > > > > Each vnode keeps _an entire copy_ of your data. There is no striping, which > I think you are conflating with RAID. Default replication (also configured > in etc/app.config) is set to three. In which case, three entire copies of > your data are kept on three different vnodes and if you indeed have five > physical nodes in your cluster you are guaranteed to have each of those > three vnodes on different physical machines. > > > -Alexander Sicular > > > > @siculars > > > > On Mar 14, 2013, at 4:42 PM, "Kevin Burton" <[email protected]> > wrote: > > > > Thank you. Let me get it straight. I have a 4 node cluster (4 physical > machines). If I have not made any changes to the ring size then I have 16 > (64/4) vnodes. Each physical node stores the actual data (the value) of > about ¼ of the data size. So when querying the data with a key given the > number of vnodes it can be determined which physical machine the data is on. > There must be enough redundancy built in so that if one or more of the > physical machines go down the remaining physical machines can reconstruct > the values lost by the lost vnodes. Correct so far? Now where does > replication some in? The documentation indicates that there are 3 copies of > the data (default) made. How is this changed and how can this replication of > the data be taken advantage of? > > > > From: Alexander Sicular [mailto:[email protected]] > Sent: Thursday, March 14, 2013 3:28 PM > To: Kevin Burton > Cc: [email protected] > Subject: Re: Bigger data than disk space? > > > > Hi Kevin, > > > > The Riak distribution model is not based on "buckets" but rather the hash of > the bucket/key combination. That hash (and associated data) is then > allocated against a "vnode". A vnode, in turn, is one of n where n is the > ring_creation_size (default is 64, modify in etc/app.config). Each physical > machine in a Riak cluster claims an equal share of the ring. For example, a > cluster with five machines (the recommended minimum for a production > cluster) and the default ring_creation_size will have 64/5 vnodes per > physical machine (not sure if they round down or up but all machines will > have about the same number of vnodes). What you would do to make more data > available is either add a machine to the cluster whose available disk space > is equal or greater than the cluster member with the least amount of total > space or increase the space on all machines already in the cluster. > > > > tl;dr add a machine to your cluster. > > > > > -Alexander Sicular > > > > @siculars > > > > On Mar 14, 2013, at 3:41 PM, Kevin Burton <[email protected]> wrote: > > > > > I am relatively new to Riak so forgive me if this has been asked before. I > have a very thin understanding of a Riak cluster and understand somewhat > about replication. In planning I foresee a time when the amount of data > exceeds the disk space that is available to a single node. What facilities > are there to essentially “split” a bucket across several servers? How is > this handled? > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
