This is a normal behavior, since link extraction is not done until
fetching but after that. So each iteration will fetch the urls that
was detected until last iteration if you do not limit your segment size.
HTH
Stefan
Am 23.01.2006 um 02:38 schrieb WONG KIONG:
Hi all,
I'm now using nutch 0.7.1. I am using whole-web crawling method
and i had successfully indexed the segments. I crawled totally 4
website but what I got in the end only 59 pages and 55 links from
database. Then I generated another segment and fetched again from
same website, after updated my database and read it, i got 328
pages and 451 links. Then third time i even got 879 pages and 1673
links. I wonder why i could only get 50 plus pages and links
fetched at first time while hundreds or thousands of them at
following times? is it strange my result like this or it is usual?
Before it, I had changed some of the property in both nutch-
default.xml and nutch-site.xml, changed property I had listed below :
http.time.out 1000000
http.content.limit -1
http.max.delays 5
fetcher.server.delay 20
Thank you all very much for your attentiion to my problems.
Send instant messages to your online friends http://
uk.messenger.yahoo.com
---------------------------------------------------------------
company: http://www.media-style.com
forum: http://www.text-mining.org
blog: http://www.find23.net