RE: Bigger data than disk space?

Kevin Burton Thu, 14 Mar 2013 13:57:35 -0700

So that is what I am missing. If each vnode keeps an entire copy of my data
and I have 4 physical node then there are 16 vnodes per physical node. That
would mean I have the data replicated 16 times per physical node. 10 GB
turns into 160GB etc. Right? So wont I run out of disk space?

From: Alexander Sicular [mailto:[email protected]] 
Sent: Thursday, March 14, 2013 3:51 PM
To: Kevin Burton
Cc: [email protected]
Subject: Re: Bigger data than disk space?

Each vnode keeps _an entire copy_ of your data. There is no striping, which
I think you are conflating with RAID. Default replication (also configured
in etc/app.config) is set to three. In which case, three entire copies of
your data are kept on three different vnodes and if you indeed have five
physical nodes in your cluster you are guaranteed to have each of those
three vnodes on different physical machines.

-Alexander Sicular

@siculars

On Mar 14, 2013, at 4:42 PM, "Kevin Burton" <[email protected]>
wrote:

Thank you. Let me get it straight. I have a 4 node cluster (4 physical
machines). If I have not made any changes to the ring size then I have 16
(64/4) vnodes. Each physical node stores the actual data (the value) of
about ¼ of the data size. So when querying the data with a key given the
number of vnodes it can be determined which physical machine the data is on.
There must be enough redundancy built in so that if one or more of the
physical machines go down the remaining physical machines can reconstruct
the values lost by the lost vnodes. Correct so far? Now where does
replication some in? The documentation indicates that there are 3 copies of
the data (default) made. How is this changed and how can this replication of
the data be taken advantage of?

From: Alexander Sicular [mailto:[email protected]] 
Sent: Thursday, March 14, 2013 3:28 PM
To: Kevin Burton
Cc: [email protected]
Subject: Re: Bigger data than disk space?

Hi Kevin,

The Riak distribution model is not based on "buckets" but rather the hash of
the bucket/key combination. That hash (and associated data) is then
allocated against a "vnode". A vnode, in turn, is one of n where n is the
ring_creation_size (default is 64, modify in etc/app.config). Each physical
machine in a Riak cluster claims an equal share of the ring. For example, a
cluster with five machines (the recommended minimum for a production
cluster) and the default ring_creation_size will have 64/5 vnodes per
physical machine (not sure if they round down or up but all machines will
have about the same number of vnodes). What you would do to make more data
available is either add a machine to the cluster whose available disk space
is equal or greater than the cluster member with the least amount of total
space or increase the space on all machines already in the cluster.

tl;dr add a machine to your cluster.

-Alexander Sicular

@siculars

On Mar 14, 2013, at 3:41 PM, Kevin Burton <
<mailto:[email protected]> [email protected]> wrote:

I am relatively new to Riak so forgive me if this has been asked before. I
have a very thin understanding of a Riak cluster and understand somewhat
about replication. In planning I foresee a time when the amount of data
exceeds the disk space that is available to a single node. What facilities
are there to essentially split a bucket across several servers? How is
this handled?

_______________________________________________
riak-users mailing list
 <mailto:[email protected]> [email protected]
 <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

RE: Bigger data than disk space?

Reply via email to