nutch81 pages seems were not kept but no error message found

Chee Wu Wed, 03 Jan 2007 04:33:32 -0800

Hi all,
  I am using crawl tool in Nutch81 under cygwin,trying to retrieve
pages from about 2 thousand websites,and the crawl process has been
running for nearly 20 hours.
   But during the past 10 hours, the fetch status always remain the
same as below:
   TOTAL urls: 165212
   retry 0:    164110
   retry 1:    814
   retry 2:    288
   min score:  0.0
   avg score:  0.029228665
   max score:  2.333
   status 1 (DB_unfetched):    134960
   status 2 (DB_fetched):      27812
   status 3 (DB_gone): 2440
all the number in the status remain the same. DB_fetched page always
is 27812. From the console output and hadoop.log I can see the the
page fetching process is running without any error.


the size of the crawl db also have no change,always be 328M.

I have tried to solve this problem during all the last week. any hints
for this problem is appreciated. Thanks and bow~~~

nutch81 pages seems were not kept but no error message found

Reply via email to