Re: Fetch timeouts

Franz Werfel Thu, 16 Feb 2006 09:35:29 -0800

Hey, thanks a lot, that worked!! ;-)
BTW, is the issue resolved in 0.8?
Thanks again,
Frank.



On 2/16/06, mos <[EMAIL PROTECTED]> wrote:
> Try to increase the value for the parameter of
>
> <property>
>  <name>fetcher.threads.per.host</name>
>  <value>1</value>
> </property>
>
> This could help if you crawl pages from one host and if you run into 
> time-outs.
>
> By the way:
> It's important to avoid time-outs because in Nutch 0.7.1 there is a bug that
> prevents the crawler to refetch those pages.  See:
> http://issues.apache.org/jira/browse/NUTCH-205
> (At the moment the apache jira is unvailable)
>
>
>
>
>
> On 2/16/06, Franz Werfel <[EMAIL PROTECTED]> wrote:
> > Hello, When trying to fetch pages from a specific web site, I end up
> > with 80% of the fetches timing out. Those 80% are always the same urls
> > (not random) and occur no matter which limit I set in
> > fetcher.server.delay and retries (http.max.delays).
> > However, those same pages load fine when retrieved from a browser, and
> > use no redirect, etc. In fact, they seem no different than the pages
> > that do not time out (although they must be different in some way?)
> > I am at a loss to understand what is going on. In what direction
> > should one go to investigate this problem?
> > Thanks,
> > Frank.
> >
>

Re: Fetch timeouts

Reply via email to