Felix von Zadow Thu, 23 Mar 2017 10:18:19 -0700
Hi! I found the headings plugin for Nutch 1.x which extracts content from <h1>, <h2>, ... in HTML pages. Is there a similar plugin for 2.3.1? Or is there another recommended way to go about extracting content from specific HTML tags?
Thanks! Felix