Hi all, I am using crawl tool in Nutch81 under cygwin,trying to retrieve pages from about 2 thousand websites,and the crawl process has been running for nearly 20 hours. But during the past 10 hours, the fetch status always remain the same as below: TOTAL urls: 165212 retry 0: 164110 retry 1: 814 retry 2: 288 min score: 0.0 avg score: 0.029228665 max score: 2.333 status 1 (DB_unfetched): 134960 status 2 (DB_fetched): 27812 status 3 (DB_gone): 2440 all the number in the status remain the same. DB_fetched page always is 27812. From the console output and hadoop.log I can see the the page fetching process is running without any error.
the size of the crawl db also have no change,always be 328M. I have tried to solve this problem during all the last week. any hints for this problem is appreciated. Thanks and bow~~~ ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers