Thanks Doğacan, I set numFetchers but only see the fetch being done from one host at one time, not all at the same time. This is what I ran:
-bash-3.00$ bin/nutch generate crawl/crawldb crawl/segments -numFetchers 3 Generator: Selecting best-scoring urls due for fetch. Generator: starting Generator: segment: crawl/segments/20080918173443 Generator: filtering: true Generator: Partitioning selected urls by host, for politeness. Generator: done. -bash-3.00$ bin/nutch fetch crawl/segments/20080918173443 Fetcher: starting Fetcher: segment: crawl/segments/20080918173443 > Date: Thu, 18 Sep 2008 18:34:26 +0300 > From: [EMAIL PROTECTED] > To: [email protected] > Subject: Re: running fetches in hadoop > > Hi, > > On Thu, Sep 18, 2008 at 5:23 PM, Edward Quick <[EMAIL PROTECTED]> wrote: > > > > I have 3 hosts in a hadoop cluster and noticed that the fetch only runs > > from one host at a time. > > Is that right or should the fetch run from all 3 hosts at the same time? > > > > Try running generate like this: > > bin/nutch generate <other options> -numFetchers 3 > > > Thanks, > > > > Ed. > > > > _________________________________________________________________ > > Discover Bird's Eye View now with Multimap from Live Search > > http://clk.atdmt.com/UKM/go/111354026/direct/01/ > > > > -- > Doğacan Güney _________________________________________________________________ Discover Bird's Eye View now with Multimap from Live Search http://clk.atdmt.com/UKM/go/111354026/direct/01/
