[tesseract-ocr] Tesseract 3.03: PDF-OCR generated PDFs show coding artefacts => do not use lossy (jpg) compression! Use lossless compression (png)!!

Tom Mon, 28 Jul 2014 01:04:28 -0700

Using the PDF-OCR option I noticed that the Tesseract-generated mixed-mode 
PDFs (original image-PDF plus OCR-ed text) show coding artefacts which were 
not present in the input image files (I use ImageMagick convert to render 
one image (png or bmp) per PDF-input-page).


So I propose to change Tesseract PDF-OCR mode

   - do not use lossy compression
   - use lossless compression (png)

when rendering the final mixed-mode PDF output files.


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4f5bc0a8-878c-4ae5-b861-39cfca638a32%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Tesseract 3.03: PDF-OCR generated PDFs show coding artefacts => do not use lossy (jpg) compression! Use lossless compression (png)!!

Reply via email to