Hey, thanks a lot, that worked!! ;-) BTW, is the issue resolved in 0.8? Thanks again, Frank.
On 2/16/06, mos <[EMAIL PROTECTED]> wrote: > Try to increase the value for the parameter of > > <property> > <name>fetcher.threads.per.host</name> > <value>1</value> > </property> > > This could help if you crawl pages from one host and if you run into > time-outs. > > By the way: > It's important to avoid time-outs because in Nutch 0.7.1 there is a bug that > prevents the crawler to refetch those pages. See: > http://issues.apache.org/jira/browse/NUTCH-205 > (At the moment the apache jira is unvailable) > > > > > > On 2/16/06, Franz Werfel <[EMAIL PROTECTED]> wrote: > > Hello, When trying to fetch pages from a specific web site, I end up > > with 80% of the fetches timing out. Those 80% are always the same urls > > (not random) and occur no matter which limit I set in > > fetcher.server.delay and retries (http.max.delays). > > However, those same pages load fine when retrieved from a browser, and > > use no redirect, etc. In fact, they seem no different than the pages > > that do not time out (although they must be different in some way?) > > I am at a loss to understand what is going on. In what direction > > should one go to investigate this problem? > > Thanks, > > Frank. > > > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
