this parser fails to extract outlinks from http://lucene.apache.org/solr/api/index.html
although there are some frame elements with src attributes. i have tried to debug why this happens. it seems that HtmlParser from tika is filtering something out. when i use the tagsoup parser to feed the dom, i get the outlinks as expected. regards reinhard

