Re: some questions about the crawling with Nutch

[email protected] Sun, 17 Jul 2011 22:38:22 -0700

Hi,
Whta would be optimal parameters would require some experimentation.
But with the right db.fetch.interval.max between two fetches (in the
nutch-default.xml) and scheduled daily crawl you would be able crawl through
all of the pages eventually. Here you may like to restrict the crawls to the
domain you want to be crawled and not move out (By appropriate changes in
crawl-urlfilter.txt)


Otherwise you may schedule different depths for different domain differently
with different depths (or run them manually). 

Solutions can be many, depends upon what suits you.

Don't know much about your 2nd question. So cant comment on that.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/some-questions-about-the-crawling-with-Nutch-tp3173828p3178201.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: some questions about the crawling with Nutch

Reply via email to