> 
> 2008/9/19 Edward Quick <[EMAIL PROTECTED]>:
> >
> > Also forgot to mention, what should mapred.map.tasks and 
> > mapred.reduce.tasks be set to?
> >
> 
> I haven't run fetcher in distributed mode for a while, but back then,
> fetcher would run as many map tasks as there are
> parts under crawl_generate. So, maybe this has changed. Anyway, try
> setting mapred.map.tasks to 3 as well for fetching.
> I think that may work.

I set it to 3 in hadoop-site.xml and restarted the hadoop before running the 
generate.
Unfortunately I still only see the fetch running on one box.

> 
> > Thanks,
> >
> > Ed.
> >
> > From: [EMAIL PROTECTED]
> > To: [email protected]
> > Subject: RE: running fetches in hadoop
> > Date: Thu, 18 Sep 2008 19:36:45 +0000
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >>
> >> 2008/9/18 Edward Quick <[EMAIL PROTECTED]>:
> >> >
> >> > Thanks Doğacan,
> >> >
> >> > I set numFetchers but only see the fetch being done from one host at one 
> >> > time, not all at the same time.
> >> > This is what I ran:
> >> >
> >> > -bash-3.00$ bin/nutch generate crawl/crawldb crawl/segments -numFetchers 
> >> > 3
> >> > Generator: Selecting best-scoring urls due for fetch.
> >> > Generator: starting
> >> > Generator: segment: crawl/segments/20080918173443
> >> > Generator: filtering: true
> >> > Generator: Partitioning selected urls by host, for politeness.
> >> > Generator: done.
> >> > -bash-3.00$ bin/nutch fetch crawl/segments/20080918173443
> >> > Fetcher: starting
> >> > Fetcher: segment: crawl/segments/20080918173443
> >> >
> >>
> >> Hmm, how many parts are under crawl/segments/20080918173443/crawl_generate?
> >
> > -bash-3.00$ bin/hadoop dfs -ls crawl/segments/20080918173443/crawl_generate
> > Found 3 items
> > /user/nutch/crawl/segments/20080918173443/crawl_generate/part-00000     <r 
> > 1>   86      2008-09-18 17:35        rw-r--r--       nutch   supergroup
> > /user/nutch/crawl/segments/20080918173443/crawl_generate/part-00001     <r 
> > 1>   86      2008-09-18 17:35        rw-r--r--       nutch   supergroup
> > /user/nutch/crawl/segments/20080918173443/crawl_generate/part-00002     <r 
> > 1>   442915  2008-09-18 17:35        rw-r--r--       nutch   supergroup
> > -bash-3.00$
> >
> > This is what I have set in nutch-site.xml remembering I have 3 hosts:
> > fetcher.server.delay 0.01
> > fetcher.threads.fetch 10
> > fetcher.threads.per.host 30
> >
> >>
> >> >
> >> >
> >> >
> >> >> Date: Thu, 18 Sep 2008 18:34:26 +0300
> >> >> From: [EMAIL PROTECTED]
> >> >> To: [email protected]
> >> >> Subject: Re: running fetches in hadoop
> >> >>
> >> >> Hi,
> >> >>
> >> >> On Thu, Sep 18, 2008 at 5:23 PM, Edward Quick <[EMAIL PROTECTED]> wrote:
> >> >> >
> >> >> > I have 3 hosts in a hadoop cluster and noticed that the fetch only 
> >> >> > runs from one host at a time.
> >> >> > Is that right or should the fetch run from all 3 hosts at the same 
> >> >> > time?
> >> >> >
> >> >>
> >> >> Try running generate like this:
> >> >>
> >> >> bin/nutch generate <other options> -numFetchers 3
> >> >>
> >> >> > Thanks,
> >> >> >
> >> >> > Ed.
> >> >> >
> >> >> > _________________________________________________________________
> >> >> > Discover Bird's Eye View now with Multimap from Live Search
> >> >> > http://clk.atdmt.com/UKM/go/111354026/direct/01/
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Doğacan Güney
> >> >
> >> > _________________________________________________________________
> >> > Discover Bird's Eye View now with Multimap from Live Search
> >> > http://clk.atdmt.com/UKM/go/111354026/direct/01/
> >>
> >>
> >>
> >> --
> >> Doğacan Güney
> >
> > Try Facebook in Windows Live Messenger! Try it Now!
> >
> > _________________________________________________________________
> > Make a mini you and download it into Windows Live Messenger
> > http://clk.atdmt.com/UKM/go/111354029/direct/01/
> 
> 
> 
> -- 
> Doğacan Güney

_________________________________________________________________
Make a mini you and download it into Windows Live Messenger
http://clk.atdmt.com/UKM/go/111354029/direct/01/

Reply via email to