Hello, friends
I want configure tika to use only PDFParser
So I make tika-config.xml with exact content:
<parser name="parse-pdf" class="org.apache.tika.parser.pdf.PDFParser">
<mime>application/pdf</mime>
</parser>
And I have test
File file = getResourceAsFile("/test-documents/testPDF.pdf");
TikaConfig myTC = new
TikaConfig(getResourceAsFile("/test-documents/tika-config.xml"));
String s1 = ParseUtils.getStringContent(file, myTC);
It fails on last line
java.lang.NullPointerException
at
org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:111)
at
org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:170)
at
org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:188)
at org.apache.tika.TestParsers.testOwnPDFParser(TestParsers.java:60)
Debug shows that tika-config.xml contain incorrect configuration
Next one works fine:
<blabla>
<parser name="parse-pdf" class="org.apache.tika.parser.pdf.PDFParser">
<mime>application/pdf</mime>
</parser>
</blabla>
Is there any documentation about Tika configuration, or at least a link to
correct and well formed tika-config.xml?
Thanks