I investigated the crawler output in more detail and discovered that for
over 90% of the pages I crawl that have outlinks one day but don't the next
day (even though their content has not changed) - I can account for
somewhere else in the crawl that day, i.e. the outlinks either appear as the
outlinks of another page or as the url of a page so it looks like they
aren't fetched because that have already been fetched that day.
However, I'm still encountering some problems in understanding what happened
to the other 10%. I checked a few of the outlinks by hand and some could not
be crawled due to HTTP errors but can someone please explain why the rest of
the outlinks aren't stored? Are there some standard things I can check for?
Is this normal behavior? At the moment I'm only looking in the resulting
crawl segment for these outlinks - should I be looking somewhere else?
I'd really, really appreciate some help with this.
Thanks,
Karen
----- Original Message -----
From: "Karen Church" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Tuesday, October 25, 2005 9:14 PM
Subject: Outlinks?
Hi,
I have a strange question regarding outlinks. I have crawled the same page
on two consecutive days. On the first day the page has 10 outlinks but on
the 2nd day no outlinks are generated/recorded. However the content of
the page hasn't changed. Can anyone suggest a reason for this??? Am I
doing something wrong?
Thanks,
Karen