I am experiencing a very strange phenomenon.
I have a small site -- only 7 pages -- and I am giving a single page in the
seed list (the front page of the site). With a depth of 10 (which is just my
default, it's overkill here), I get all of the pages except the front page. I
can see in the logs that the index page is being fetched, and there are no
errors. But it's never passed on to the indexer. I've got debug turned on in
the logs, and I can find nothing unusual, except this page is just vanishing.
If I give the list of 7 pages explicitly and crawl with a depth of 1, I get all
of the pages including the index page. If I give the list of 7 pages explicitly
and crawl with a depth of 2, I don't get the index page but I do get all the
rest.
What on earth is going on here? I do not have topN set or any other strange
settings. Obviously I could just provide the url list and crawl at a depth of
1, but I really don't want to do that. I can't be certain I'll know if new
pages are added, and I don't want to miss them just because they aren't in my
seed list.
Has anyone ever seen something like this before? I'm eager for any help someone
might be able to offer.
Thanks,
Ann
____________________________________________________________________________________
Never miss a thing. Make Yahoo your home page.
http://www.yahoo.com/r/hs