this parser fails to extract outlinks from

http://lucene.apache.org/solr/api/index.html

although there are some frame elements with src attributes.
i have tried to debug why this happens.
it seems that HtmlParser from tika is filtering something out.
when i use the tagsoup parser to feed the dom, i get the outlinks as
expected.

regards
reinhard

Reply via email to