Re: hadoop on single machine

[EMAIL PROTECTED] Thu, 30 Aug 2007 15:07:01 -0700

hi Tomislav,

The Nutch Tutorial is the way to go. Fetching 100 sites of 1000 nodeswith a single machine should definitively be OK. You might want to addmore machines if many many people are searching your index.

BTW, Nutch is "always" using Hadoop. When testing locally or when usingonly one machine, Hadoop just uses the local file system. So even theNutchTutorial uses Hadoop.


HTH,
Renaud

Would it be recommended to use hadoop for crawling (100 sites with 1000
pages each) on a single machine? What would be the benefit?
Something like described on:
http://wiki.apache.org/nutch/NutchHadoopTutorial but on a single
machine.


Or is the simple crawl/recrawl (without hadoop, like described in nutch

tutorial on wiki:http://wiki.apache.org/nutch/NutchTutorial + recrawl script from wiki)

way to go?

Thanks,
       Tomislav

Re: hadoop on single machine

Reply via email to