Redirected-to pages and not-there pages are fetched multiple times

Carl Cerecke Wed, 25 Jul 2007 21:08:16 -0700

This is the behaviour I am noticing with pages that have a serverredirect (300-range code):

Say page A redirects to page B. A is in the fetchlist created bygenerate. When A is fetched, the redirect is followed and B is fetched.At the next updatedb, both A and B go into the crawldb. For some reason,at the next generate, page B is listed to be fetched. And again at thenext generate, and so on.



An example is:

http://www.selecthotels.com

which redirects to http://203.210.113.143/ ('page B').

This page always seems to be in the fetchlist no matter how many timesit gets fetched. (To make matter more complicated, it also redirects toyet another URL.)


How do I fix this behaviour?

Also, other URLs whose fetch fails for some reason stay in the crawldband are tried again and again. For a 'deep' search using topN=1000, eachfetchlist generated after a number of runs has many hundreds of thesefailed URLs that it tries to refetch.


How do I fix this behaviour too?



End of the day for me (NZST). I'll try again tomorrow....

Cheers,
Carl.

Redirected-to pages and not-there pages are fetched multiple times

Reply via email to