Hello,
you should add in your classpath also the commons-logging-1.1.1.jar File.
To extract Text from a PDF FIle (given as inputstream) I'm using
following method (perhaps is not the best one):
private String parsePdfFile(InputStream stream) throws Exception {
StringWriter output = new StringWriter(4096);
PDDocument document = null;
try {
document = PDDocument.load(stream);
if (document.isEncrypted()) {
try {
document.decrypt("");
} catch (Throwable e) {
log.warn("Could not parse PDF File since the
document is encrypted");
return "";
}
}
PDFTextStripper stripper = new PDFTextStripper();
stripper.setStartPage(1);
stripper.setEndPage(Integer.MAX_VALUE);
stripper.writeText(document, output);
return output.toString();
} catch (EOFException eofe) {
log.warn("EOF Exception parsing PDF Document");
return "";
} catch (Exception e) {
log.info("Exception parsing PDF document", e);
return "";
} finally {
if (document != null) {
try {
document.close();
} catch (Exception e) {
/* ignore */
}
}
}
}
Regards,
Patrick
Nitin Shukla wrote:
Hello,
I am looking out to extract text, text location; font etc details from PDF file
and looking out for pdf libraries that can help me do this. I came across the
PDFBox today and wanted to evaluate it.
I am looking for any quick tutorial that can help me get started on how to use
of PDFBox library to extract text from pdf and it's font information, text
location etc. Can anyone point me to such tutorial that shows how to make use
of PDFBox APIs to extract text etc?
I tried using running the command line utility that is bundled with PDFBox jar
to extract text as follows.
$ java -cp log4j-1.2.15.jar;pdfbox-0.8.0-incubating.jar org/apache/pdfbox/ExtractText
"D:\Test Lab\Murex Sample Reports\INVOICE00009.pdf" INVOICE00009.txt
But the above command execution threw the following error.
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/commons/logging/LogFactory
I don't see the org/apache/commons/logging/LogFactory in the
pdfbox-0.8.0-incubating.jar nor in the log4j-1.2.15.jar. Can someone help point
what am I doing wrong? Am I missing something??
Thanks n Regards,
Nitin
________________________________
http://www.mindtree.com/email/disclaimer.html