I believe the retry numbers are the number of times page fetches failed
for recoverable errors and were re-processed before the page was
fetched. So most of the pages were fetched on the first try. Some
encountered errors and were fetched on the next try and so on. The
default setting is a max 3 retrys in the db.fetch.retry.max property.
Dennis
TDLN wrote:
Running nutch readdb crawl/crawldb -stats gives:
060423 230050 TOTAL urls: 3917224
060423 230050 avg score: 1.311
060423 230050 max score: 34380.355
060423 230050 min score: 1.0
060423 230050 retry 0: 3852405
060423 230050 retry 1: 60970
060423 230050 retry 2: 1529
060423 230050 retry 3: 2320
060423 230050 status 1 (DB_unfetched): 2626041
060423 230050 status 2 (DB_fetched): 1235145
060423 230050 status 3 (DB_gone): 56038
What exactly do the "retry" messages mean?
Rgrds, Thomas
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general