Regards, Raj. To know that data that you want to process with Hadoop is critical for this, at least an approximation of the data. I think that Hadoop Operations is an invaluable resource for this:
- Hadoop use heavily RAM, so, the first resource that you have to consider is to use all available RAM that you could give to the nodes, with a marked focus on the NameNode/JobTracker Node. - For the DataNode/TaskTracker nodes, is very good to have fast disks, like SSDs but they are expensive, so you can consider this too. For me WD Barracuda are awesome. - A good network connection between the nodes. Hadoop is a RCP-based platform, so a good network is critical for a healthy cluster A good start for me is for a small cluster: - NN/JT: 8 to 16 GB RAM - DN/TT: 4 to 8 GB RAM Consider to use always compression, to optimize the communication between all services in your Hadoop cluster (Snappy is my favorite) All these advices are in the Hadoop Operations book from Eric, so, it´s must-read for every Hadoop System Engineer. 2013/4/29 Raj Hadoop <[email protected]> > Hi, > > I have to propose some hardware requirements in my company for a Proof of > Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera > Website. But just wanted to know from the group - what is the requirements > if I have to plan for a 5 node cluster. I dont know at this time, the data > that need to be processed at this time for the Proof of Concept. So - can > you suggest something to me? > > Regards, > Raj > -- Marcos Ortiz Valmaseda, *Data-Driven Product Manager* at PDVSA *Blog*: http://dataddict.wordpress.com/ *LinkedIn: *http://www.linkedin.com/in/marcosluis2186 *Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>
