Hi pls change the value of "db.max.outlinks.per.page"(default is 100) property to say 1000.
<property> <name>db.max.outlinks.per.page</name> <value>1000</value> <description>The maximum number of outlinks that we'll process for a page. </description> </property> /Jack On 1/20/06, Nguyen Ngoc Giang <[EMAIL PROTECTED]> wrote: > Hi everyone, > > I found that getOutlinks function in html-parser/DOMContentUtils.java > doesn't work correctly for some cases. An example is this website: > http://blog.donews.com/boyla/. The function returns only 170 records, while > in fact it contains a lot more (Firefox returns 356 links!). > > When I compare the hyperlink list with the one returned by Firefox, the > orders are exactly identical, meaning that the 170th link of getOutlinks > function is the same as the 170th link of Firefox. Therefore, it seems that > the algorithm is correct, but there is some bug around. There is no > threshold at this point, since the max outlinks parameter is set at updatedb > part. Even when I increase the max outlinks to 1000, the situation still > remains. > > Any suggestions are very appreciated. > > Regards, > Giang > > -- Keep Discovering ... ... http://www.jroller.com/page/jmars ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
