Would it be recommended to use hadoop for crawling (100 sites with 1000 pages each) on a single machine? What would be the benefit? Something like described on: http://wiki.apache.org/nutch/NutchHadoopTutorial but on a single machine.
Or is the simple crawl/recrawl (without hadoop, like described in nutch tutorial on wiki: http://wiki.apache.org/nutch/NutchTutorial + recrawl script from wiki) way to go? Thanks, Tomislav
