Personally, I would just use Image::Magick or GD to convert the .pdf into a .tiff and then simply have tesseract ocr it.
Someone else may have a better solution though. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Eitan Sent: Monday, January 04, 2010 8:10 AM To: tesseract-ocr Subject: Extracting text from PDF Hi I am a newbie... Is there a standard way to extract text from PDF using tesseract-ocr ? Thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

