Then that is not quite as bad but still if I have 10 GB of data and to support replication that requires 30 GB of disk space, what if I only have 20 GB of disk space per physical node?
-----Original Message----- From: Mark Phillips [mailto:[email protected]] Sent: Thursday, March 14, 2013 4:05 PM To: Kevin Burton Cc: Alexander Sicular; [email protected] Subject: Re: Bigger data than disk space? Kevin, On Thu, Mar 14, 2013 at 1:56 PM, Kevin Burton <[email protected]> wrote: > So that is what I am missing. If each vnode keeps an entire copy of my > data and I have 4 physical node then there are 16 vnodes per physical > node. That would mean I have the data replicated 16 times per physical > node. 10 GB turns into 160GB etc. Right? So wont I run out of disk space? > Your raw data set is replicated 3 times by default. Three different vnodes of your total (by default 64) will be responsible for each replica. So, 10GB raw = 30GB replicated. Mark > > > From: Alexander Sicular [mailto:[email protected]] > Sent: Thursday, March 14, 2013 3:51 PM > > > To: Kevin Burton > Cc: [email protected] > Subject: Re: Bigger data than disk space? > > > > Each vnode keeps _an entire copy_ of your data. There is no striping, > which I think you are conflating with RAID. Default replication (also > configured in etc/app.config) is set to three. In which case, three > entire copies of your data are kept on three different vnodes and if > you indeed have five physical nodes in your cluster you are guaranteed > to have each of those three vnodes on different physical machines. > > > -Alexander Sicular > > > > @siculars > > > > On Mar 14, 2013, at 4:42 PM, "Kevin Burton" <[email protected]> > wrote: > > > > Thank you. Let me get it straight. I have a 4 node cluster (4 physical > machines). If I have not made any changes to the ring size then I have > 16 > (64/4) vnodes. Each physical node stores the actual data (the value) > of about ¼ of the data size. So when querying the data with a key > given the number of vnodes it can be determined which physical machine the data is on. > There must be enough redundancy built in so that if one or more of the > physical machines go down the remaining physical machines can > reconstruct the values lost by the lost vnodes. Correct so far? Now > where does replication some in? The documentation indicates that there > are 3 copies of the data (default) made. How is this changed and how > can this replication of the data be taken advantage of? > > > > From: Alexander Sicular [mailto:[email protected]] > Sent: Thursday, March 14, 2013 3:28 PM > To: Kevin Burton > Cc: [email protected] > Subject: Re: Bigger data than disk space? > > > > Hi Kevin, > > > > The Riak distribution model is not based on "buckets" but rather the > hash of the bucket/key combination. That hash (and associated data) is > then allocated against a "vnode". A vnode, in turn, is one of n where > n is the ring_creation_size (default is 64, modify in etc/app.config). > Each physical machine in a Riak cluster claims an equal share of the > ring. For example, a cluster with five machines (the recommended > minimum for a production > cluster) and the default ring_creation_size will have 64/5 vnodes per > physical machine (not sure if they round down or up but all machines > will have about the same number of vnodes). What you would do to make > more data available is either add a machine to the cluster whose > available disk space is equal or greater than the cluster member with > the least amount of total space or increase the space on all machines already in the cluster. > > > > tl;dr add a machine to your cluster. > > > > > -Alexander Sicular > > > > @siculars > > > > On Mar 14, 2013, at 3:41 PM, Kevin Burton <[email protected]> wrote: > > > > > I am relatively new to Riak so forgive me if this has been asked > before. I have a very thin understanding of a Riak cluster and > understand somewhat about replication. In planning I foresee a time > when the amount of data exceeds the disk space that is available to a > single node. What facilities are there to essentially split a bucket > across several servers? How is this handled? > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
