[ 
https://issues.apache.org/jira/browse/TIKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781311#action_12781311
 ] 

Yuan-Fang Li commented on TIKA-309:
-----------------------------------

Hi Chris, Jukka,

Yes, the Tika tests are passing for me. However, my test for one of the 
ontologies ("http://www.w3.org/2002/07/owl#";) is still failing, and here is 
why. 

In test tika-core/src/test/java/org/apache/tika/mime/MimeDetectionTest.java, 
the method testUrl(String expected, String url, String file) is actually 
testing the content in the file named "file" with the url being a clue for the 
detection. My test, however, opens an input stream on the actual url and use 
that to detect the mime type. For the above URL, tika is testing against the 
file named "test-difficult-rdf2.xml". The only difference I can see between 
this file and the actual content of the URl is the one line at the top: "<?xml 
version='1.0' encoding='ISO-8859-1'?>". This line is present in the tika test 
file but not in the URL.

So. if you remove/comment out that line from "test-difficult-rdf2.xml" and run 
the following maven command to run the test: mvn -Dtest=MimeDetectionTest test, 
it will fail. Or, you could use the following test case to test against the 
real URL.

    @Test
    public void testRDFStreamMimeType() throws IOException {
        URL url = new URL("http://www.w3.org/2002/07/owl#";);
        final InputStream stream = new BufferedInputStream(url.openStream());
        try {
            MimeTypes mimeTypes = 
TikaConfig.getDefaultConfig().getMimeRepository();
            Metadata metadata = new Metadata();
            String mime = mimeTypes.detect(stream, metadata).toString();
            assertEquals("application/rdf+xml", mime);
        } finally {
            stream.close();
        }
    }

Cheers
Yuan-Fang

> Mime type application/rdf+xml not correctly detected
> ----------------------------------------------------
>
>                 Key: TIKA-309
>                 URL: https://issues.apache.org/jira/browse/TIKA-309
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 0.5
>            Reporter: Yuan-Fang Li
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>             Fix For: 0.5
>
>
> Mime type detector using AutoDetectParser and Metadata returns 
> "application/xml" for the URL http://www.w3.org/2002/07/owl#, where it should 
> be "application/rdf+xml". The correct mime type is also suggested here: 
> http://www.w3.org/TR/owl-ref/#MIMEType.
> P.S., Tika was downloaded from svn and built with Maven last week.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to