Text extraction - Any tutorials?

Nitin Shukla Thu, 19 Nov 2009 03:24:55 -0800

Hello,

I am looking out to extract text, text location; font etc details from PDF file 
and looking out for pdf libraries that can help me do this. I came across the 
PDFBox today and wanted to evaluate it.


I am looking for any quick tutorial that can help me get started on how to use 
of PDFBox library to extract text from pdf and it's font information, text 
location etc. Can anyone point me to such tutorial that shows how to make use 
of PDFBox APIs to extract text etc?


I tried using running the command line utility that is bundled with PDFBox jar 
to extract text as follows.

$ java -cp log4j-1.2.15.jar;pdfbox-0.8.0-incubating.jar 
org/apache/pdfbox/ExtractText "D:\Test Lab\Murex Sample 
Reports\INVOICE00009.pdf" INVOICE00009.txt

But the above command execution threw the following error.

Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/commons/logging/LogFactory

I don't see the org/apache/commons/logging/LogFactory in the 
pdfbox-0.8.0-incubating.jar nor in the log4j-1.2.15.jar. Can someone help point 
what am I doing wrong? Am I missing something??

Thanks n Regards,
Nitin


________________________________
http://www.mindtree.com/email/disclaimer.html

Text extraction - Any tutorials?

Reply via email to