Good point . hadoop sprays its blocks around randomly. Thus if replication
factor nodes are down some blocks are not found. The larger the cluster the
higher chance nodes are down.

To deal with this increase rf once the cluster gets to be very large.


On Wednesday, December 5, 2012, Eric Parusel <ericparu...@gmail.com> wrote:
> Hi all,
> I've been wondering about virtual nodes and how cluster uptime might
change as cluster size increases.
> I understand clusters will benefit from increased reliability due to
faster rebuild time, but does that hold true for large clusters?
> It seems that since (and correct me if I'm wrong here) every physical
node will likely share some small amount of data with every other node,
that as the count of physical nodes in a Cassandra cluster increases (let's
say into the triple digits) that the probability of at least one failure to
Quorum read/write occurring in a given time period would *increase*.
> Would this hold true, at least until physical nodes becomes greater than
num_tokens per node?
>
> I understand that the window of failure for affected ranges would
probably be small but we do Quorum reads of many keys, so we'd likely hit
every virtual range with our queries, even if num_tokens was 256.
> Thanks,
> Eric

Reply via email to