Log when db.max configuration limits reached
--------------------------------------------

         Key: NUTCH-182
         URL: http://issues.apache.org/jira/browse/NUTCH-182
     Project: Nutch
        Type: Improvement
  Components: fetcher  
    Versions: 0.8-dev    
    Reporter: Matt Kangas
    Priority: Trivial


Followup to http://www.nabble.com/Re%3A-Can%27t-index-some-pages-p2480833.html

There are three "db.max" parameters currently in nutch-default.xml:
 * db.max.outlinks.per.page
 * db.max.anchor.length
 * db.max.inlinks

Having values that are too low can result in a site being under-crawled. 
However, currently there is nothing written to the log when these limits are 
hit, so users have to guess when they need to raise these values.

I suggest that we add three new log messages at the appropriate points:
 * "Exceeded db.max.outlinks.per.page for URL "
 * "Exceeded db.max.anchor.length for URL "
 * "Exceeded db.max.inlinks for URL "

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to