Moving from tika-dev, hopefully Dave's watching...
On Nov 17, 2008, at 5:54 PM, Dave Meikle wrote:
Hi Grant,
2008/11/15 Grant Ingersoll <[EMAIL PROTECTED]>
Is text/xml a supported mime-type? It doesn't appear to be. I
notice
there is application/xml that maps to the DcXMLParser, but I also
notice
there is an XMLParser, but it doesn't seem to be mapped to
anything. Is
this a bug or a feature?
text/xml is defined an alias of application/xml as is as you say
mapped to
the DcXMLParser.
In TestParsers, I tried:
public void testXMLExtraction() throws Exception {
File file = getTestFile("testXML.xml");
String s1 = ParseUtils.getStringContent(file, tc);
String s2 = ParseUtils.getStringContent(file, tc,
"application/xml");
assertEquals(s1, s2);
String s3 = ParseUtils.getStringContent(file, tc, "text/xml");
assertEquals(s1, s3);
}
and I get:
java.lang.NullPointerException
at org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:
173)
at org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:
232)
at org.apache.tika.TestParsers.testXMLExtraction(TestParsers.java:94)
DcXMLParser is a sub-class of XMLPaser designed to add the
extraction Dublic
Core Metadata if it exists to the default XMLParser extraction. It
does this
by using XMLParser's getDefaultHandler method, which is a
TextContentHandler, via the super.getDefaultHandler() call at the
start of
its getDefaultHandler method.
Cheers,
Dave