Hi,
We're testing TIKA-980 (MicrodataContentHandler for Apache Tika) and a lot of
URL's work out just fine if microdata is implemented properly. But we're also
seeing a lot of webmasters putting meta tags with microdata properties right in
the body! They apparently read Google's webmaster page [1] about invisible
microdata and went along adding meta tags to the body as if it's normal
practice.
Whenever the webmaster has for example:
<meta content="EUR" itemprop="priceCurrency">
<span itemprop="price">17.50</span>
..the MicrodataContentHandler trips over it and cannot assign price to an
itemscope because the DOM seems to become reordered/normalized, even when i
(in a test) properly close the meta tag. What does Tika do to meta tags in the
content when using the IdentityHtmlMapper? How can we read the meta tag as if
it's just another tag? Is there some switch or setting i've missed?
[1]: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=146750
Thanks,
Markus