If your dataset is that small I wouldn't recommend HBase.

In any case in order to scale easily from 1 to x you'd have to start
in a 1 node fully distributed setup. Then when you do scale to 2 and 3
you'll have to set dfs.replication to those same values (unless you
don't care about your data that much).

J-D

On Wed, Feb 8, 2012 at 1:19 AM, D S <[email protected]> wrote:
> Hi,
>
> I have this really simple question for this group.  I'm a bit unsure how
> standalone mode and distributed mode works in a way that solves this data
> set.  From what I've read, in order for distributed mode to
> work efficiently, I need around 5 servers?  Possibly 6 so one can run
> zookeeper?
>
> Anyways, I have a data set that slowly grows as time goes on.  It will
> start out really only needing one server (standalone will work quite well)
> but as months pass, the data set will grow and grow and grow.  I don't need
> 6+ servers yet but at the same time, 1 server can only last me so long.
>
> How do you usually divide a cluster that requires between 2 to 6 servers?
>
> Thanks

Reply via email to