Try to increase the value for the parameter of

<property>
  <name>fetcher.threads.per.host</name>
  <value>1</value>
</property>

This could help if you crawl pages from one host and if you run into time-outs.

By the way:
It's important to avoid time-outs because in Nutch 0.7.1 there is a bug that
prevents the crawler to refetch those pages.  See:
http://issues.apache.org/jira/browse/NUTCH-205
(At the moment the apache jira is unvailable)





On 2/16/06, Franz Werfel <[EMAIL PROTECTED]> wrote:
> Hello, When trying to fetch pages from a specific web site, I end up
> with 80% of the fetches timing out. Those 80% are always the same urls
> (not random) and occur no matter which limit I set in
> fetcher.server.delay and retries (http.max.delays).
> However, those same pages load fine when retrieved from a browser, and
> use no redirect, etc. In fact, they seem no different than the pages
> that do not time out (although they must be different in some way?)
> I am at a loss to understand what is going on. In what direction
> should one go to investigate this problem?
> Thanks,
> Frank.
>


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to