hi Tomislav,

The Nutch Tutorial is the way to go. Fetching 100 sites of 1000 nodes with a single machine should definitively be OK. You might want to add more machines if many many people are searching your index.

BTW, Nutch is "always" using Hadoop. When testing locally or when using only one machine, Hadoop just uses the local file system. So even the NutchTutorial uses Hadoop.

HTH,
Renaud

Would it be recommended to use hadoop for crawling (100 sites with 1000
pages each) on a single machine? What would be the benefit?
Something like described on:
http://wiki.apache.org/nutch/NutchHadoopTutorial but on a single
machine.


Or is the simple crawl/recrawl (without hadoop, like described in nutch
tutorial on wiki: http://wiki.apache.org/nutch/NutchTutorial + recrawl script from wiki)
way to go?

Thanks,
       Tomislav



Reply via email to