[Nutch-dev] nutch81 pages seems were not kept but no error message found

Chee Wu Wed, 03 Jan 2007 04:30:32 -0800

Hi all,
   I am using crawl tool in Nutch81 under cygwin,trying to retrieve
pages from about 2 thousand websites,and the crawl process has been
running for nearly 20 hours.
    But during the past 10 hours, the fetch status always remain the
same as below:
    TOTAL urls: 165212
    retry 0:    164110
    retry 1:    814
    retry 2:    288
    min score:  0.0
    avg score:  0.029228665
    max score:  2.333
    status 1 (DB_unfetched):    134960
    status 2 (DB_fetched):      27812
    status 3 (DB_gone): 2440
all the number in the status remain the same. DB_fetched page always
is 27812. From the console output and hadoop.log I can see the the
page fetching process is running without any error.


the size of the crawl db also have no change,always be 328M.

I have tried to solve this problem during all the last week. any hints
for this problem is appreciated. Thanks and bow~~~

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] nutch81 pages seems were not kept but no error message found

Reply via email to