Thanks for the commandline furnished by you for benefit of community. Also
I like to have your images also.


On Tue, Jul 29, 2014 at 1:30 AM, Tom <[email protected]> wrote:

> Commandline:
>
>
>
>> # the convert command (part of Imagemagick) creates a clean lossless
>> compressed image 1.png
>>
> # if you already have a png with characters and digits in it, you do not
>> need the following command:
>>
> convert -density 300x300 -depth 8 1.pdf 1.png
>>
>
>>
> # the Tesseract is called and creates a mixed mode pdf with filename
>> "1.png.pdf"
>>
> # this output shows coding artefacts between the characters and digits if
>> you enlarge the view
>>
> # I can supply you with images (on request)
>>
> tesseract -l eng 1.png 1.png pdf
>>
>
>
>
> Am Montag, 28. Juli 2014 09:52:50 UTC+2 schrieb Tom:
>
>> Using the PDF-OCR option I noticed that the Tesseract-generated
>> mixed-mode PDFs (original image-PDF plus OCR-ed text) show coding artefacts
>> which were not present in the input image files (I use ImageMagick convert
>> to render one image (png or bmp) per PDF-input-page).
>>
>> So I propose to change Tesseract PDF-OCR mode
>>
>>    - do not use lossy compression
>>    - use lossless compression (png)
>>
>> when rendering the final mixed-mode PDF output files.
>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/5b80105f-8db1-42bb-bf2d-3806ea0c052f%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/5b80105f-8db1-42bb-bf2d-3806ea0c052f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CANKD7Yx%2BvX0DVUeWte%3DDVDdHg4TU3WTP%3Di%2BYSc%3DSRj%2BTJTo_Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to