Hello, I am looking out to extract text, text location; font etc details from PDF file and looking out for pdf libraries that can help me do this. I came across the PDFBox today and wanted to evaluate it.
I am looking for any quick tutorial that can help me get started on how to use of PDFBox library to extract text from pdf and it's font information, text location etc. Can anyone point me to such tutorial that shows how to make use of PDFBox APIs to extract text etc? I tried using running the command line utility that is bundled with PDFBox jar to extract text as follows. $ java -cp log4j-1.2.15.jar;pdfbox-0.8.0-incubating.jar org/apache/pdfbox/ExtractText "D:\Test Lab\Murex Sample Reports\INVOICE00009.pdf" INVOICE00009.txt But the above command execution threw the following error. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory I don't see the org/apache/commons/logging/LogFactory in the pdfbox-0.8.0-incubating.jar nor in the log4j-1.2.15.jar. Can someone help point what am I doing wrong? Am I missing something?? Thanks n Regards, Nitin ________________________________ http://www.mindtree.com/email/disclaimer.html

