Hello, I am wondering how to only crawl the domains of a injected seed without adding external URLs to the database?
Lets say I have 5k urls in my seed, and I want nutch to crawl everything(Or some million urls) for each domain in the fastest way possible. What settings should I use? I will have topN at about 20k, and I want the db_unfetched to be around 20k for each iteration? What should I set "db.max.outlinks.per.page" to? I was wondering about setting it to 4, to get 4*5k=20k for the first iteration? Can anyone help me? Thanks, James Ford -- View this message in context: http://lucene.472066.n3.nabble.com/Make-Nutch-to-crawl-internal-urls-only-tp3974397.html Sent from the Nutch - User mailing list archive at Nabble.com.

