[ https://issues.apache.org/jira/browse/TIKA-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann resolved TIKA-336. ------------------------------------ Resolution: Fixed - fixed in r884340 Yuan-Fang, please test out the latest Tika trunk. I've: * updated the test-difficult-rdf2.xml file to remove the <?xml header * updated the tika-mimetypes.xml to detect files that start with <!-- as xml files (as a default magic first check). Then, this forces xmlRoot detection to occur where the specific XML subclass is detected (which is what we want). There, application/rdf+xml is properly detected. Before, since there was no magic header for <!--, the initial magic result check was null and then the mimeTypes detector ended up returning text/plain. In the future we may want to make: * xmlRoot extraction occur on text/plain documents * move the text/plain check to the beginning of the o.a.tika.mime.MimeTypes#getMimeType(byte[] data) function > More issues with RDF mime detection > ----------------------------------- > > Key: TIKA-336 > URL: https://issues.apache.org/jira/browse/TIKA-336 > Project: Tika > Issue Type: Bug > Components: mime > Affects Versions: 0.5 > Environment: several user environments as well as validated in > Mattmann's environment. > Reporter: Chris A. Mattmann > Assignee: Chris A. Mattmann > Fix For: 0.6 > > > See TIKA-309 for related discussion, but there seems to be further errors in > RDF mime detection, on the OWL file located here: > http://www.w3.org/2002/07/owl# -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.