We have released version 1.3 of Tesseract Studio with the following
enhancements:
- Improved memory management to support large multi-page files.
- Streaming interface to Leptonica.
- Eliminate unnecessary cache of images.
- Unload processed pages early.
- Tested with a scanned file of 2,100 pages.
- New OCR options:
- OCR only image objects in vector PDF files, or
- Fully rasterize and OCR each page.
- New Save options:
- Save as vector PDF with existing objects (including visible text)
preserved and merged with OCR.
- Save as searchable PDF where each page has a single image which
overlays hidden OCR data.
- Maintain original color if applicable, or
- Convert to grayscale before saving, or
- Convert to monochrome using a dithering algorithm, or
- Convert to monochrome using dynamic or specified thresholding.
- Specify or automatically assign resolution to control PDF size.
- Save as text-only PDF.
- Use a visible font for OCR and other text objects.
- Pick standard type 1 fonts to reduce PDF size.
- Embed any available font into the PDF file (with some overhead).
- Format OCR and other text to approximate the original layout
(without graphics).
- Some bug fixes.
Download: https://github.com/OpaitSoftware/TesseractStudio.Net
>
> Thank you,
>
> Farhad Khalafi
> Opait Software
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/5d1bf52c-04a1-42e9-be7d-25dd667c27c2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.