Hi everyone, I found that getOutlinks function in html-parser/DOMContentUtils.java doesn't work correctly for some cases. An example is this website: http://blog.donews.com/boyla/. The function returns only 170 records, while in fact it contains a lot more (Firefox returns 356 links!).
When I compare the hyperlink list with the one returned by Firefox, the orders are exactly identical, meaning that the 170th link of getOutlinks function is the same as the 170th link of Firefox. Therefore, it seems that the algorithm is correct, but there is some bug around. There is no threshold at this point, since the max outlinks parameter is set at updatedb part. Even when I increase the max outlinks to 1000, the situation still remains. Any suggestions are very appreciated. Regards, Giang
