I have a valid pdf file for testing. I pass its name to the test
program below and I get back:
got file ./test.pdf
Title: null
Author: null
Content:
Closing stream...
Can anyone see what I am doing wrong? This is tika 0.9 that comes with
the latest Solr release.
Thanks - Tod
imports not listed for brevity;
public class testTika {
public static void main(String[] args) throws ClassNotFoundException {
try {
InputStream stream = new FileInputStream(new File(args[0]));
System.err.println("got file: " + args[0]);
try {
Parser parser = new AutoDetectParser();
BodyContentHandler textHandler = new BodyContentHandler();
Metadata metadata = new Metadata();
metadata.set(Metadata.RESOURCE_NAME_KEY,args[0]);
ParseContext context = new ParseContext();
parser.parse(stream, textHandler, metadata, context);
System.out.println("Title: " + metadata.get(Metadata.TITLE));
System.out.println("Author: " + metadata.get("Author"));
System.out.println("Content: " + textHandler.toString());
} finally {
System.out.println("Closing stream...");
stream.close();
}
} catch (Exception ge) {
System.err.println("Problem ... bailing");
ge.printStackTrace();
}
}
}