The Nutch DB stats (and everything else in there) will not get updated until you actually issue a "updatedb" command on a fetched segment. Nutch does not support real-time updates of this information.
----- Original Message ---- From: Chee Wu <[EMAIL PROTECTED]> To: [email protected] Sent: Wednesday, January 3, 2007 7:33:08 AM Subject: nutch81 pages seems were not kept but no error message found Hi all, I am using crawl tool in Nutch81 under cygwin,trying to retrieve pages from about 2 thousand websites,and the crawl process has been running for nearly 20 hours. But during the past 10 hours, the fetch status always remain the same as below: TOTAL urls: 165212 retry 0: 164110 retry 1: 814 retry 2: 288 min score: 0.0 avg score: 0.029228665 max score: 2.333 status 1 (DB_unfetched): 134960 status 2 (DB_fetched): 27812 status 3 (DB_gone): 2440 all the number in the status remain the same. DB_fetched page always is 27812. From the console output and hadoop.log I can see the the page fetching process is running without any error. the size of the crawl db also have no change,always be 328M. I have tried to solve this problem during all the last week. any hints for this problem is appreciated. Thanks and bow~~~
