Hi Michael,
> I wonder if there is not already a build-in option to exclude HTML
> elements (like a div with a given id or class or other elements like header).
No, there isn't one so far.
> I know https://issues.apache.org/jira/browse/NUTCH-585
> I also do not understand why this little patc
Hello,
I use Nutch 1.18 to crawl our documentation with the parse-html plugin. Each
page has elements like TOCs which should not be included.
I know https://issues.apache.org/jira/browse/NUTCH-585 and included one of the
patches.
However, I wonder if there is not already a build-in option to ex
2 matches
Mail list logo