Thanks Sebastian, I think I will try looking into the HtmlParseFilter since we do have control over the content we are crawling and indexing.
-- View this message in context: http://lucene.472066.n3.nabble.com/Nutch-how-to-crawl-but-not-index-the-site-navigation-w-Solr-tp4078702p4079169.html Sent from the Nutch - User mailing list archive at Nabble.com.

