Hi Dave,
could you share an example document? Which Nutch version is used?
I tried to reproduce the problem without success using Nutch v1.16:
- example document:
Test metatags
test for metatag extraction
- using parse-html (works)
> bin/nutch indexchecker -Dmetatags.names='*' \
Hi Everyone,
It seems like I take 1 step forward and 2 steps backwards.
I was using parse-tika and I needed to change to parse-html in order to use
a plug-in for excluding content such as headers and footers.
I have the excludes working with the plug-in. But now I see that all of
the metatags
2 matches
Mail list logo