Re: metatags missing with parse-html

2019-10-14 Thread Sebastian Nagel
Hi Dave, could you share an example document? Which Nutch version is used? I tried to reproduce the problem without success using Nutch v1.16: - example document: Test metatags test for metatag extraction - using parse-html (works) > bin/nutch indexchecker -Dmetatags.names='*' \

metatags missing with parse-html

2019-10-11 Thread Dave Beckstrom
Hi Everyone, It seems like I take 1 step forward and 2 steps backwards. I was using parse-tika and I needed to change to parse-html in order to use a plug-in for excluding content such as headers and footers. I have the excludes working with the plug-in. But now I see that all of the metatags