Kamil Wnuk wrote:
In UpdateDatabaseTool, the function pageGone( ... ) sets pages that have remained unreachable for a certain number of retries to never be fetched. Is there a compelling reason to keep such pages around? It seems like the right thing to do in this case would be to just remove the page from the webdb with "webdb.deletePage( oldPage )" in order to keep the webdb from accumulating data about pages that no longer exist. I would be happy to submit this change if anyone is interested, otherwise please let me know why the current implementation is necessary.

The purpose of this is that, to not waste time trying to fetch these pages are if other references to them are encountered. Imagine a large site with a bad link on it that you re-crawl frequently. Should we keep re-learning that the link is bad, or should we remember?

Doug

Reply via email to