If your dataset is that small I wouldn't recommend HBase. In any case in order to scale easily from 1 to x you'd have to start in a 1 node fully distributed setup. Then when you do scale to 2 and 3 you'll have to set dfs.replication to those same values (unless you don't care about your data that much).
J-D On Wed, Feb 8, 2012 at 1:19 AM, D S <[email protected]> wrote: > Hi, > > I have this really simple question for this group. I'm a bit unsure how > standalone mode and distributed mode works in a way that solves this data > set. From what I've read, in order for distributed mode to > work efficiently, I need around 5 servers? Possibly 6 so one can run > zookeeper? > > Anyways, I have a data set that slowly grows as time goes on. It will > start out really only needing one server (standalone will work quite well) > but as months pass, the data set will grow and grow and grow. I don't need > 6+ servers yet but at the same time, 1 server can only last me so long. > > How do you usually divide a cluster that requires between 2 to 6 servers? > > Thanks
