Hi everyone,

  I found that getOutlinks function in html-parser/DOMContentUtils.java
doesn't work correctly for some cases. An example is this website:
http://blog.donews.com/boyla/. The function returns only 170 records, while
in fact it contains a lot more (Firefox returns 356 links!).

  When I compare the hyperlink list with the one returned by Firefox, the
orders are exactly identical, meaning that the 170th link of getOutlinks
function is the same as the 170th link of Firefox. Therefore, it seems that
the algorithm is correct, but there is some bug around. There is no
threshold at this point, since the max outlinks parameter is set at updatedb
part. Even when I increase the max outlinks to 1000, the situation still
remains.

  Any suggestions are very appreciated.

  Regards,
  Giang

Reply via email to