Did you get my message with the links to the package and source? I was surprised to see this older message from me just after I posted the links.

Don Marang
Vinux Software Development Coordinator (vinux.org.uk)

There is just so much stuff in the world that, to me, is devoid of any real substance, value, and content that I just try to make sure that I am working on things that matter.
Dean Kamen


--------------------------------------------------
From: "KHEM Sochenda" <[email protected]>
Sent: Tuesday, February 22, 2011 11:37 PM
To: <[email protected]>
Subject: Re: VietOCR v2.0/3.1 & VietOCR.NET v2.0 Releases

Thank you Don for the comments.

On Tue, Feb 8, 2011 at 4:06 PM, SpeedyChair <[email protected]> wrote:
Another way to prepare a PDF document for tesseract is to use the 'convert' command from the ImageMagick package to split an image only PDF file into a
series of GrayScale TIFF images, one for each page.  This convert command
can work on just about any image.  For PDF conversions, it actually makes
ghostscript do all of the work. This same syntax also works with multi-page
TIFF files and Postscript files.

convert mydoc.pdf -type GrayScale -depth 8 -scene 1 mydoc-%03d.tif

Then you would need to loop through the TIFF files to perform OCR on each
page image. In a day or two, I will update my speedy-ocr bash script, which
will now handle PDF image files.

Don Marang
Vinux Software Coordinator - vinux.org.uk

There is just so much stuff in the world that, to me, is devoid of any real substance, value, and content that I just try to make sure that I am working
on things that matter.
Dean Kamen

From: KHEM Sochenda
Sent: Monday, February 07, 2011 10:23 PM
To: [email protected]
Subject: Re: VietOCR v2.0/3.1 & VietOCR.NET v2.0 Releases
Dear Quan,

I would like to know how to let tesseract OCR work with pdf documents.

Thank you very much in advance for you kind response.

With Best Regards,

Sochenda

On Tue, Feb 8, 2011 at 7:56 AM, Quan Nguyen <[email protected]> wrote:

A Java/.NET GUI frontend for Tesseract OCR engine. The releases
include the following fixes and improvements:

* Add support for spellcheck suggestion in context menu
* Improve program accessibility and usability
* Add support for downloading and installing language data packs and
appropriate spell dictionaries
* Add UI localization for Lithuanian and Slovak
* Update Tesseract OCR engine to 3.01 (r551) (v3.1 only)

http://vietocr.sf.net

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.


--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.



--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to