RE: Bigger data than disk space?

Kevin Burton Thu, 14 Mar 2013 14:09:35 -0700

Then that is not quite as bad but still if I have 10 GB of data and to
support replication that requires 30 GB of disk space, what if I only have
20 GB of disk space per physical node?


-----Original Message-----
From: Mark Phillips [mailto:[email protected]] 
Sent: Thursday, March 14, 2013 4:05 PM
To: Kevin Burton
Cc: Alexander Sicular; [email protected]
Subject: Re: Bigger data than disk space?

Kevin,

On Thu, Mar 14, 2013 at 1:56 PM, Kevin Burton <[email protected]>
wrote:
> So that is what I am missing. If each vnode keeps an entire copy of my 
> data and I have 4 physical node then there are 16 vnodes per physical 
> node. That would mean I have the data replicated 16 times per physical 
> node. 10 GB turns into 160GB etc. Right? So wont I run out of disk space?
>

Your raw data set is replicated 3 times by default. Three different vnodes
of your total (by default 64) will be responsible for each replica. So, 10GB
raw  = 30GB replicated.

Mark

>
>
> From: Alexander Sicular [mailto:[email protected]]
> Sent: Thursday, March 14, 2013 3:51 PM
>
>
> To: Kevin Burton
> Cc: [email protected]
> Subject: Re: Bigger data than disk space?
>
>
>
> Each vnode keeps _an entire copy_ of your data. There is no striping, 
> which I think you are conflating with RAID. Default replication (also 
> configured in etc/app.config) is set to three. In which case, three 
> entire copies of your data are kept on three different vnodes and if 
> you indeed have five physical nodes in your cluster you are guaranteed 
> to have each of those three vnodes on different physical machines.
>
>
> -Alexander Sicular
>
>
>
> @siculars
>
>
>
> On Mar 14, 2013, at 4:42 PM, "Kevin Burton" <[email protected]>
> wrote:
>
>
>
> Thank you. Let me get it straight. I have a 4 node cluster (4 physical 
> machines). If I have not made any changes to the ring size then I have 
> 16
> (64/4) vnodes. Each physical node stores the actual data (the value) 
> of about ¼ of the data size. So when querying the data with a key 
> given the number of vnodes it can be determined which physical machine the
data is on.
> There must be enough redundancy built in so that if one or more of the 
> physical machines go down the remaining physical machines can 
> reconstruct the values lost by the lost vnodes. Correct so far? Now 
> where does replication some in? The documentation indicates that there 
> are 3 copies of the data (default) made. How is this changed and how 
> can this replication of the data be taken advantage of?
>
>
>
> From: Alexander Sicular [mailto:[email protected]]
> Sent: Thursday, March 14, 2013 3:28 PM
> To: Kevin Burton
> Cc: [email protected]
> Subject: Re: Bigger data than disk space?
>
>
>
> Hi Kevin,
>
>
>
> The Riak distribution model is not based on "buckets" but rather the 
> hash of the bucket/key combination. That hash (and associated data) is 
> then allocated against a "vnode". A vnode, in turn, is one of n where 
> n is the ring_creation_size (default is 64, modify in etc/app.config). 
> Each physical machine in a Riak cluster claims an equal share of the 
> ring. For example, a cluster with five machines (the recommended 
> minimum for a production
> cluster) and the default ring_creation_size will have 64/5 vnodes per 
> physical machine (not sure if they round down or up but all machines 
> will have about the same number of vnodes). What you would do to make 
> more data available is either add a machine to the cluster whose 
> available disk space is equal or greater than the cluster member with 
> the least amount of total space or increase the space on all machines
already in the cluster.
>
>
>
> tl;dr add a machine to your cluster.
>
>
>
>
> -Alexander Sicular
>
>
>
> @siculars
>
>
>
> On Mar 14, 2013, at 3:41 PM, Kevin Burton <[email protected]>
wrote:
>
>
>
>
> I am relatively new to Riak so forgive me if this has been asked 
> before. I have a very thin understanding of a Riak cluster and 
> understand somewhat about replication. In planning I foresee a time 
> when the amount of data exceeds the disk space that is available to a 
> single node. What facilities are there to essentially split a bucket 
> across several servers? How is this handled?
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

RE: Bigger data than disk space?

Reply via email to