Also forgot to mention, what should mapred.map.tasks and mapred.reduce.tasks be set to?
Thanks, Ed. From: [EMAIL PROTECTED] To: [email protected] Subject: RE: running fetches in hadoop Date: Thu, 18 Sep 2008 19:36:45 +0000 > > 2008/9/18 Edward Quick <[EMAIL PROTECTED]>: > > > > Thanks Doğacan, > > > > I set numFetchers but only see the fetch being done from one host at one > > time, not all at the same time. > > This is what I ran: > > > > -bash-3.00$ bin/nutch generate crawl/crawldb crawl/segments -numFetchers 3 > > Generator: Selecting best-scoring urls due for fetch. > > Generator: starting > > Generator: segment: crawl/segments/20080918173443 > > Generator: filtering: true > > Generator: Partitioning selected urls by host, for politeness. > > Generator: done. > > -bash-3.00$ bin/nutch fetch crawl/segments/20080918173443 > > Fetcher: starting > > Fetcher: segment: crawl/segments/20080918173443 > > > > Hmm, how many parts are under crawl/segments/20080918173443/crawl_generate? -bash-3.00$ bin/hadoop dfs -ls crawl/segments/20080918173443/crawl_generate Found 3 items /user/nutch/crawl/segments/20080918173443/crawl_generate/part-00000 <r 1> 86 2008-09-18 17:35 rw-r--r-- nutch supergroup /user/nutch/crawl/segments/20080918173443/crawl_generate/part-00001 <r 1> 86 2008-09-18 17:35 rw-r--r-- nutch supergroup /user/nutch/crawl/segments/20080918173443/crawl_generate/part-00002 <r 1> 442915 2008-09-18 17:35 rw-r--r-- nutch supergroup -bash-3.00$ This is what I have set in nutch-site.xml remembering I have 3 hosts: fetcher.server.delay 0.01 fetcher.threads.fetch 10 fetcher.threads.per.host 30 > > > > > > > > >> Date: Thu, 18 Sep 2008 18:34:26 +0300 > >> From: [EMAIL PROTECTED] > >> To: [email protected] > >> Subject: Re: running fetches in hadoop > >> > >> Hi, > >> > >> On Thu, Sep 18, 2008 at 5:23 PM, Edward Quick <[EMAIL PROTECTED]> wrote: > >> > > >> > I have 3 hosts in a hadoop cluster and noticed that the fetch only runs > >> > from one host at a time. > >> > Is that right or should the fetch run from all 3 hosts at the same time? > >> > > >> > >> Try running generate like this: > >> > >> bin/nutch generate <other options> -numFetchers 3 > >> > >> > Thanks, > >> > > >> > Ed. > >> > > >> > _________________________________________________________________ > >> > Discover Bird's Eye View now with Multimap from Live Search > >> > http://clk.atdmt.com/UKM/go/111354026/direct/01/ > >> > >> > >> > >> -- > >> Doğacan Güney > > > > _________________________________________________________________ > > Discover Bird's Eye View now with Multimap from Live Search > > http://clk.atdmt.com/UKM/go/111354026/direct/01/ > > > > -- > Doğacan Güney Try Facebook in Windows Live Messenger! Try it Now! _________________________________________________________________ Make a mini you and download it into Windows Live Messenger http://clk.atdmt.com/UKM/go/111354029/direct/01/
