I investigated the crawler output in more detail and discovered that for over 90% of the pages I crawl that have outlinks one day but don't the next day (even though their content has not changed) - I can account for somewhere else in the crawl that day, i.e. the outlinks either appear as the outlinks of another page or as the url of a page so it looks like they aren't fetched because that have already been fetched that day.

However, I'm still encountering some problems in understanding what happened to the other 10%. I checked a few of the outlinks by hand and some could not be crawled due to HTTP errors but can someone please explain why the rest of the outlinks aren't stored? Are there some standard things I can check for? Is this normal behavior? At the moment I'm only looking in the resulting crawl segment for these outlinks - should I be looking somewhere else?

I'd really, really appreciate some help with this.
Thanks,
Karen

----- Original Message ----- From: "Karen Church" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Tuesday, October 25, 2005 9:14 PM
Subject: Outlinks?


Hi,

I have a strange question regarding outlinks. I have crawled the same page on two consecutive days. On the first day the page has 10 outlinks but on the 2nd day no outlinks are generated/recorded. However the content of the page hasn't changed. Can anyone suggest a reason for this??? Am I doing something wrong?

Thanks,
Karen


Reply via email to