Thank you Don for the comments. On Tue, Feb 8, 2011 at 4:06 PM, SpeedyChair <[email protected]> wrote: > Another way to prepare a PDF document for tesseract is to use the 'convert' > command from the ImageMagick package to split an image only PDF file into a > series of GrayScale TIFF images, one for each page. This convert command > can work on just about any image. For PDF conversions, it actually makes > ghostscript do all of the work. This same syntax also works with multi-page > TIFF files and Postscript files. > > convert mydoc.pdf -type GrayScale -depth 8 -scene 1 mydoc-%03d.tif > > Then you would need to loop through the TIFF files to perform OCR on each > page image. In a day or two, I will update my speedy-ocr bash script, which > will now handle PDF image files. > > Don Marang > Vinux Software Coordinator - vinux.org.uk > > There is just so much stuff in the world that, to me, is devoid of any real > substance, value, and content that I just try to make sure that I am working > on things that matter. > Dean Kamen > > From: KHEM Sochenda > Sent: Monday, February 07, 2011 10:23 PM > To: [email protected] > Subject: Re: VietOCR v2.0/3.1 & VietOCR.NET v2.0 Releases > Dear Quan, > > I would like to know how to let tesseract OCR work with pdf documents. > > Thank you very much in advance for you kind response. > > With Best Regards, > > Sochenda > > On Tue, Feb 8, 2011 at 7:56 AM, Quan Nguyen <[email protected]> wrote: >> >> A Java/.NET GUI frontend for Tesseract OCR engine. The releases >> include the following fixes and improvements: >> >> * Add support for spellcheck suggestion in context menu >> * Improve program accessibility and usability >> * Add support for downloading and installing language data packs and >> appropriate spell dictionaries >> * Add UI localization for Lithuanian and Slovak >> * Update Tesseract OCR engine to 3.01 (r551) (v3.1 only) >> >> http://vietocr.sf.net >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. >
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

