Log when db.max configuration limits reached
--------------------------------------------
Key: NUTCH-182
URL: http://issues.apache.org/jira/browse/NUTCH-182
Project: Nutch
Type: Improvement
Components: fetcher
Versions: 0.8-dev
Reporter: Matt Kangas
Priority: Trivial
Followup to http://www.nabble.com/Re%3A-Can%27t-index-some-pages-p2480833.html
There are three "db.max" parameters currently in nutch-default.xml:
* db.max.outlinks.per.page
* db.max.anchor.length
* db.max.inlinks
Having values that are too low can result in a site being under-crawled.
However, currently there is nothing written to the log when these limits are
hit, so users have to guess when they need to raise these values.
I suggest that we add three new log messages at the appropriate points:
* "Exceeded db.max.outlinks.per.page for URL "
* "Exceeded db.max.anchor.length for URL "
* "Exceeded db.max.inlinks for URL "
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers