Crawler Behavior (2 questions)

Ian Reardon Thu, 26 May 2005 11:40:20 -0700

I have been crawling rather large sites ( larger then 10k pages) with
the crawl command.   It seems like it crawls all the pages twice.  Is
that normal?  I thought it was just removing the segments but it looks
like it crawls all the pages, does some update to the DB and then
crawls them again.  If anyone could shed some light on this I would
appreciate it.


2nd Question.  Is there a way to limit a crawl to number of pages
rather then depth?  I would like to limit a crawl to say 100 pages,
1000 pages of whatever.  I could brute force it by writing a script to
look at the logs and then killing the crawler but I'd rather not go
that approach.

Thanks.

Ian

Crawler Behavior (2 questions)

Reply via email to