Moving from tika-dev, hopefully Dave's watching...

On Nov 17, 2008, at 5:54 PM, Dave Meikle wrote:

Hi Grant,

2008/11/15 Grant Ingersoll <[EMAIL PROTECTED]>

Is text/xml a supported mime-type? It doesn't appear to be. I notice there is application/xml that maps to the DcXMLParser, but I also notice there is an XMLParser, but it doesn't seem to be mapped to anything. Is
this a bug or a feature?


text/xml is defined an alias of application/xml as is as you say mapped to
the DcXMLParser.

In TestParsers, I tried:
    public void testXMLExtraction() throws Exception {
        File file = getTestFile("testXML.xml");
        String s1 = ParseUtils.getStringContent(file, tc);
String s2 = ParseUtils.getStringContent(file, tc, "application/xml");
        assertEquals(s1, s2);
      String s3 = ParseUtils.getStringContent(file, tc, "text/xml");
      assertEquals(s1, s3);
    }

and I get:

java.lang.NullPointerException
at org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java: 173) at org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java: 232)
        at org.apache.tika.TestParsers.testXMLExtraction(TestParsers.java:94)




DcXMLParser is a sub-class of XMLPaser designed to add the extraction Dublic Core Metadata if it exists to the default XMLParser extraction. It does this
by using XMLParser's getDefaultHandler method, which is a
TextContentHandler, via the super.getDefaultHandler() call at the start of
its getDefaultHandler method.

Cheers,
Dave


Reply via email to