Hi All, I'm using Nutch with Hadoop with great pleasure - working great and really increase crawling performance on multiple machines.
I have two strong machines and two older machines which I would like to use. So far I've been using only the two strong machines with Hadoop. Now I would like to add the two less powerful machines to do some processing as well. My question is - Right now the HDFS is shared between the two powerful computers. I don't want the two other computer to store any content on them as they have a slow and unreliable harddisk. I just want the two other machines to do processing (i.e. mapreduce) and not store any content on them. Is that possible - or do I have to use HDFS on all machines that do processing? If it's possible to use a machine only for mapreduce - how this is done? Thank you for your help, Eran