Would it be recommended to use hadoop for crawling (100 sites with 1000
pages each) on a single machine? What would be the benefit?
Something like described on:
http://wiki.apache.org/nutch/NutchHadoopTutorial but on a single
machine.


Or is the simple crawl/recrawl (without hadoop, like described in nutch
tutorial on wiki:  
http://wiki.apache.org/nutch/NutchTutorial + recrawl script from wiki)
way to go?

Thanks,
       Tomislav

Reply via email to