[Nutch-general] Re: Whole-web crawling problem on nutch 0.7.1

Stefan Groschupf Mon, 23 Jan 2006 11:42:02 -0800

This is a normal behavior, since link extraction is not done untilfetching but after that. So each iteration will fetch the urls thatwas detected until last iteration if you do not limit your segment size.

HTH
Stefan

Am 23.01.2006 um 02:38 schrieb WONG KIONG:

Hi all,
I'm now using nutch 0.7.1. I am using whole-web crawling methodand i had successfully indexed the segments. I crawled totally 4website but what I got in the end only 59 pages and 55 links fromdatabase. Then I generated another segment and fetched again fromsame website, after updated my database and read it, i got 328pages and 451 links. Then third time i even got 879 pages and 1673links. I wonder why i could only get 50 plus pages and linksfetched at first time while hundreds or thousands of them atfollowing times? is it strange my result like this or it is usual?Before it, I had changed some of the property in both nutch-default.xml and nutch-site.xml, changed property I had listed below :
  http.time.out          1000000
  http.content.limit    -1
  http.max.delays      5
  fetcher.server.delay 20

  Thank you all very much for your attentiion to my problems.
Send instant messages to your online friends http://uk.messenger.yahoo.com


---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net

[Nutch-general] Re: Whole-web crawling problem on nutch 0.7.1

Reply via email to