[ 
https://issues.apache.org/jira/browse/TIKA-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann resolved TIKA-336.
------------------------------------

    Resolution: Fixed

- fixed in r884340

Yuan-Fang, please test out the latest Tika trunk. I've:

* updated the test-difficult-rdf2.xml file to remove the <?xml header
* updated the tika-mimetypes.xml to detect files that start with <!-- as xml 
files (as a default magic first check). Then, this forces xmlRoot detection to 
occur where the specific XML subclass is detected (which is what we want). 
There, application/rdf+xml is properly detected. Before, since there was no 
magic header for <!--, the initial magic result check was null and then the 
mimeTypes detector ended up returning text/plain.

In the future we may want to make:

* xmlRoot extraction occur on text/plain documents
* move the text/plain check to the beginning of the 
o.a.tika.mime.MimeTypes#getMimeType(byte[] data) function

> More issues with RDF mime detection
> -----------------------------------
>
>                 Key: TIKA-336
>                 URL: https://issues.apache.org/jira/browse/TIKA-336
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 0.5
>         Environment: several user environments as well as validated in 
> Mattmann's environment.
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 0.6
>
>
> See TIKA-309 for related discussion, but there seems to be further errors in 
> RDF mime detection, on the OWL file located here:
> http://www.w3.org/2002/07/owl#

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to