Hi, I am crawling main pages of some online newspaper web sites. I don't need deletes at all. I am using crawl once model.
Here is the settings I use : Schedule type:Scan every document once Start Method : Start at beginning of schedule window Scheduled time: Any day of week at 1 am 3 am 5 am 7 am 9 am 11 am 1 pm 3 pm 5 pm 7 pm 9 pm 11 pm plus 0 minutes Maximum run time: No limit Maximum hop count for link type 'link': 1 Maximum hop count for link type 'redirect': Unlimited Hop count mode: No deletes, forever Include only hosts matching seeds? yes Seeds: A few URLs in the form of http://main.page.com/{category} where category is Sports, Politics etc. By setting hop count to 1 ( or 2) and 'no deletes, forever', I am expecting this crawl to be super fast and most efficient. Minimal DB queries etc. Am I correct? Thanks, Ahmet
