Hey, thanks a lot, that worked!! ;-) BTW, is the issue resolved in 0.8? Thanks again, Frank.
On 2/16/06, mos <[EMAIL PROTECTED]> wrote: > Try to increase the value for the parameter of > > <property> > <name>fetcher.threads.per.host</name> > <value>1</value> > </property> > > This could help if you crawl pages from one host and if you run into > time-outs. > > By the way: > It's important to avoid time-outs because in Nutch 0.7.1 there is a bug that > prevents the crawler to refetch those pages. See: > http://issues.apache.org/jira/browse/NUTCH-205 > (At the moment the apache jira is unvailable) > > > > > > On 2/16/06, Franz Werfel <[EMAIL PROTECTED]> wrote: > > Hello, When trying to fetch pages from a specific web site, I end up > > with 80% of the fetches timing out. Those 80% are always the same urls > > (not random) and occur no matter which limit I set in > > fetcher.server.delay and retries (http.max.delays). > > However, those same pages load fine when retrieved from a browser, and > > use no redirect, etc. In fact, they seem no different than the pages > > that do not time out (although they must be different in some way?) > > I am at a loss to understand what is going on. In what direction > > should one go to investigate this problem? > > Thanks, > > Frank. > > >
