Hi Andrzej, Thanks, for going into this subject. I'm glad that this issue will be resolved in version 0.8. That make's me hopeful. :)
Sure, fixing this bug in version 0.7.1 wouldn't be necessary if the new version 0.8 will be available in the next weeks. And the workaround for me works until then: Just make complete recrawls and doesn't resuse the existing web-db of a previous crawl. ;) Greetings Oliver On 3/8/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Thanks for your persistance on this subject... ;-) I agree, it's a real > issue. Most developers (myself included) concentrate on 0.8 branch now, > which has a fix for this. > > Basically, the whole premise of pages "truly gone" seems to be > ill-defined. If we can't reach a page even 1000 times during a given > period it doesn't automatically mean it's truly gone, it could mean that > the server is temporarily down and we tried too often in a given > period... so, as long as the links from other pages are valid we should > still from time to time attempt to check the status of that page. > > That's the reasoning behind the fix that went to 0.8 - if the last fetch > was long time ago (longer than a maximum interval for the installation) > then we force refetch anyway, and if it doesn't succeed we just increase > the interval by 50%. > > Now, fixing this the same way in 0.7 would mean that pages no longer end > up in PAGE_GONE state. Is this a fix of broken behavior or a new > behavior (new feature)? I'm not sure... > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
