Hi Andrzej,

Thanks, for going into this subject.
I'm glad that this issue will be resolved in version 0.8. That make's
me hopeful. :)

Sure, fixing this bug in version 0.7.1 wouldn't be necessary if the new
version 0.8 will be available in the next weeks.
And the workaround for me works until then: Just make complete
recrawls and doesn't resuse the existing web-db of a previous crawl.
;)

Greetings
Oliver


On 3/8/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> Thanks for your persistance on this subject... ;-) I agree, it's a real
> issue. Most developers (myself included) concentrate on 0.8 branch now,
> which has a fix for this.
>
> Basically, the whole premise of pages "truly gone" seems to be
> ill-defined. If we can't reach a page even 1000 times during a given
> period it doesn't automatically mean it's truly gone, it could mean that
> the server is temporarily down and we tried too often in a given
> period... so, as long as the links from other pages are valid we should
> still from time to time attempt to check the status of that page.
>
> That's the reasoning behind the fix that went to 0.8 - if the last fetch
> was long time ago (longer than a maximum interval for the installation)
> then we force refetch anyway, and if it doesn't succeed we just increase
> the interval by 50%.
>
> Now, fixing this the same way in 0.7 would mean that pages no longer end
> up in PAGE_GONE state. Is this a fix of broken behavior or a new
> behavior (new feature)? I'm not sure...
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
>


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to