RE: Runaway fetcher threads

Arkadi.Kosmynin Mon, 19 Dec 2011 14:58:32 -0800

Hi Markus,

> -----Original Message-----
> From: Markus Jelsma [mailto:[email protected]]
> Sent: Monday, 19 December 2011 9:24 PM
> To: [email protected]
> Subject: Re: Runaway fetcher threads
> 
> 
> 
> On Monday 19 December 2011 08:32:53 [email protected] wrote:
> > Hi,
> >
> > I've observed an interesting phenomenon that is not hard to reproduce
> and
> > that I think should not be happening:
> >
> > If you have N fetcher threads, inject, say, 2xN URLs of VERY large
> files
> > plus a few smaller files to fetch and run something that uses
> > org.apache.nutch.crawl.Crawl. The big files will take forever to
> download
> > and the threads will be killed. The process then will proceed to the
> > indexing stage. However, you will see fetcher threads output in the
> logs
> > intermixed with the output of the indexer. This shows that they were
> not
> > terminated properly (or at all?).
> 
> Hi, what version are you running? Sounds like a old one. Can you try
> with a more recent version if that is the case?


I am using 1.4 latest release.


> 
> In anyway, if this is about evenly distributing files across fetch
> lists, this
> cannot be based on file size as it is unknown beforehand. That is only
> possible when recrawling large files with a modified generator and and
> updater
> that adds the Content-Length field as CrawlDatum metadata.

No, this is not related to evenly distributing files across fetch lists.

> 
> >
> > Regards,
> >
> > Arkadi
> 
> --
> Markus Jelsma - CTO - Openindex

RE: Runaway fetcher threads

Reply via email to