Crawler fetching weird urls

Jeff Van Boxtel Tue, 11 Sep 2007 12:15:40 -0700

I am experiencing a problem where my fetcher is trying to grab lots of
URLs that don't exist. For example it will try to get:
 
fetching http://www.ourhost.com/project_files/PROJECTS/000260/WP/
0L19MM14.doc/0k07mm10.doc/%200L19MM14.doc/0i13mm4.doc/0I29MM3.PDF/%200L19MM14.doc/
 
There is no such url that exists and I can't figure out where the
crawler is getting these strange urls from. I don't think any of my
pages link to something like this. I have also seen other (less bizarre)
urls that don't seem to exist and there are no links to them anywhere on
our site. Is it possible that the crawldb is getting corrupt? Is there a
way I can see where the crawldb got these URLs from? And if the urls
result in a 404 page is there a way to have them removed from the
crawldb?

Crawler fetching weird urls

Reply via email to