Hi Ahmet, I would say that would be pretty efficient. ManifoldCF will need to keep records in its jobqueue table which correspond to hopcount=2. It will never fetch these, however.
Karl On Mon, Jul 1, 2013 at 9:56 AM, Ahmet Arslan <[email protected]> wrote: > Hi, > > I am crawling main pages of some online newspaper web sites. > I don't need deletes at all. I am using crawl once model. > > Here is the settings I use : > > Schedule type: Scan every document once > Start Method : Start at beginning of schedule window > > Scheduled time: Any day of week at 1 am 3 am 5 am 7 am 9 am 11 am 1 pm 3 > pm 5 pm 7 pm 9 pm 11 pm plus 0 minutes > Maximum run time: No limit > > Maximum hop count for link type 'link': 1 > Maximum hop count for link type 'redirect': Unlimited > Hop count mode: No deletes, forever > > Include only hosts matching seeds? yes > Seeds: A few URLs in the form of http://main.page.com/{category} where > category is Sports, Politics etc. > > By setting hop count to 1 ( or 2) and 'no deletes, forever', I am > expecting this crawl to be super fast and most efficient. Minimal DB > queries etc. Am I correct? > > Thanks, > Ahmet > >
