> > 2008/9/19 Edward Quick <[EMAIL PROTECTED]>: > > > > Also forgot to mention, what should mapred.map.tasks and > > mapred.reduce.tasks be set to? > > > > I haven't run fetcher in distributed mode for a while, but back then, > fetcher would run as many map tasks as there are > parts under crawl_generate. So, maybe this has changed. Anyway, try > setting mapred.map.tasks to 3 as well for fetching. > I think that may work.
I set it to 3 in hadoop-site.xml and restarted the hadoop before running the generate. Unfortunately I still only see the fetch running on one box. > > > Thanks, > > > > Ed. > > > > From: [EMAIL PROTECTED] > > To: [email protected] > > Subject: RE: running fetches in hadoop > > Date: Thu, 18 Sep 2008 19:36:45 +0000 > > > > > > > > > > > > > > > > > > > > > >> > >> 2008/9/18 Edward Quick <[EMAIL PROTECTED]>: > >> > > >> > Thanks Doğacan, > >> > > >> > I set numFetchers but only see the fetch being done from one host at one > >> > time, not all at the same time. > >> > This is what I ran: > >> > > >> > -bash-3.00$ bin/nutch generate crawl/crawldb crawl/segments -numFetchers > >> > 3 > >> > Generator: Selecting best-scoring urls due for fetch. > >> > Generator: starting > >> > Generator: segment: crawl/segments/20080918173443 > >> > Generator: filtering: true > >> > Generator: Partitioning selected urls by host, for politeness. > >> > Generator: done. > >> > -bash-3.00$ bin/nutch fetch crawl/segments/20080918173443 > >> > Fetcher: starting > >> > Fetcher: segment: crawl/segments/20080918173443 > >> > > >> > >> Hmm, how many parts are under crawl/segments/20080918173443/crawl_generate? > > > > -bash-3.00$ bin/hadoop dfs -ls crawl/segments/20080918173443/crawl_generate > > Found 3 items > > /user/nutch/crawl/segments/20080918173443/crawl_generate/part-00000 <r > > 1> 86 2008-09-18 17:35 rw-r--r-- nutch supergroup > > /user/nutch/crawl/segments/20080918173443/crawl_generate/part-00001 <r > > 1> 86 2008-09-18 17:35 rw-r--r-- nutch supergroup > > /user/nutch/crawl/segments/20080918173443/crawl_generate/part-00002 <r > > 1> 442915 2008-09-18 17:35 rw-r--r-- nutch supergroup > > -bash-3.00$ > > > > This is what I have set in nutch-site.xml remembering I have 3 hosts: > > fetcher.server.delay 0.01 > > fetcher.threads.fetch 10 > > fetcher.threads.per.host 30 > > > >> > >> > > >> > > >> > > >> >> Date: Thu, 18 Sep 2008 18:34:26 +0300 > >> >> From: [EMAIL PROTECTED] > >> >> To: [email protected] > >> >> Subject: Re: running fetches in hadoop > >> >> > >> >> Hi, > >> >> > >> >> On Thu, Sep 18, 2008 at 5:23 PM, Edward Quick <[EMAIL PROTECTED]> wrote: > >> >> > > >> >> > I have 3 hosts in a hadoop cluster and noticed that the fetch only > >> >> > runs from one host at a time. > >> >> > Is that right or should the fetch run from all 3 hosts at the same > >> >> > time? > >> >> > > >> >> > >> >> Try running generate like this: > >> >> > >> >> bin/nutch generate <other options> -numFetchers 3 > >> >> > >> >> > Thanks, > >> >> > > >> >> > Ed. > >> >> > > >> >> > _________________________________________________________________ > >> >> > Discover Bird's Eye View now with Multimap from Live Search > >> >> > http://clk.atdmt.com/UKM/go/111354026/direct/01/ > >> >> > >> >> > >> >> > >> >> -- > >> >> Doğacan Güney > >> > > >> > _________________________________________________________________ > >> > Discover Bird's Eye View now with Multimap from Live Search > >> > http://clk.atdmt.com/UKM/go/111354026/direct/01/ > >> > >> > >> > >> -- > >> Doğacan Güney > > > > Try Facebook in Windows Live Messenger! Try it Now! > > > > _________________________________________________________________ > > Make a mini you and download it into Windows Live Messenger > > http://clk.atdmt.com/UKM/go/111354029/direct/01/ > > > > -- > Doğacan Güney _________________________________________________________________ Make a mini you and download it into Windows Live Messenger http://clk.atdmt.com/UKM/go/111354029/direct/01/
