I have been crawling rather large sites ( larger then 10k pages) with the crawl command. It seems like it crawls all the pages twice. Is that normal? I thought it was just removing the segments but it looks like it crawls all the pages, does some update to the DB and then crawls them again. If anyone could shed some light on this I would appreciate it.
2nd Question. Is there a way to limit a crawl to number of pages rather then depth? I would like to limit a crawl to say 100 pages, 1000 pages of whatever. I could brute force it by writing a script to look at the logs and then killing the crawler but I'd rather not go that approach. Thanks. Ian
