colin created TIKA-1615: --------------------------- Summary: Html fragments with comments before div elements are not been detected as html Key: TIKA-1615 URL: https://issues.apache.org/jira/browse/TIKA-1615 Project: Tika Issue Type: Bug Components: detector Affects Versions: 1.7 Reporter: colin
We are trying to import html fragments into Solr. The below is not being detected as html <!-- test --> <div> test </div> When the comment is removed the fragment is being parsed as html, this functionality was added by https://issues.apache.org/jira/browse/TIKA-1102 To work around this, we added <root-XML localName="div"/> <root-XML localName="DIV"/> to the <mime-type type="text/html"> element in tika-mimetypes.xml The fragment is then parsed as expected -- This message was sent by Atlassian JIRA (v6.3.4#6332)