Hello, I'm using Nutch 1.8 and trying to index HTML metadata in Solr. I followed the steps for parsing metatags and had no issues while using parse-html for parsing HTML. The problem arises when I modify parse-plugins.xml to parse HTML docs with Tika. When Tika parses the doc and plugin.includes has parse-metatags and index-metadata listed, the specified metadata fields show up twice. So, running indexchecker will list metatag.description twice, with identical content.
eg. *metatag.description : CONCORD, N.H. -- September's primary for the Republican nomination for governor pits Walt Havenstein* *metatag.description : CONCORD, N.H. -- September's primary for the Republican nomination for governor pits Walt Havenstein* Likewise, actually trying to index with Solr will cause Solr to complain that the field must allow multiple values, and setting multiValued="true" will cause two identical values to be indexed for the field. I need to parse HTML pages with Tika because I'm using Boilerpipe, so I can't just use parse-html, and I can't figure out why this issue is showing up with Tika. Any ideas? Best, Jonathan

