Hi, I have this really simple question for this group. I'm a bit unsure how standalone mode and distributed mode works in a way that solves this data set. From what I've read, in order for distributed mode to work efficiently, I need around 5 servers? Possibly 6 so one can run zookeeper?
Anyways, I have a data set that slowly grows as time goes on. It will start out really only needing one server (standalone will work quite well) but as months pass, the data set will grow and grow and grow. I don't need 6+ servers yet but at the same time, 1 server can only last me so long. How do you usually divide a cluster that requires between 2 to 6 servers? Thanks
