Thanks for the commandline furnished by you for benefit of community. Also I like to have your images also.
On Tue, Jul 29, 2014 at 1:30 AM, Tom <[email protected]> wrote: > Commandline: > > > >> # the convert command (part of Imagemagick) creates a clean lossless >> compressed image 1.png >> > # if you already have a png with characters and digits in it, you do not >> need the following command: >> > convert -density 300x300 -depth 8 1.pdf 1.png >> > >> > # the Tesseract is called and creates a mixed mode pdf with filename >> "1.png.pdf" >> > # this output shows coding artefacts between the characters and digits if >> you enlarge the view >> > # I can supply you with images (on request) >> > tesseract -l eng 1.png 1.png pdf >> > > > > Am Montag, 28. Juli 2014 09:52:50 UTC+2 schrieb Tom: > >> Using the PDF-OCR option I noticed that the Tesseract-generated >> mixed-mode PDFs (original image-PDF plus OCR-ed text) show coding artefacts >> which were not present in the input image files (I use ImageMagick convert >> to render one image (png or bmp) per PDF-input-page). >> >> So I propose to change Tesseract PDF-OCR mode >> >> - do not use lossy compression >> - use lossless compression (png) >> >> when rendering the final mixed-mode PDF output files. >> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/5b80105f-8db1-42bb-bf2d-3806ea0c052f%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/5b80105f-8db1-42bb-bf2d-3806ea0c052f%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CANKD7Yx%2BvX0DVUeWte%3DDVDdHg4TU3WTP%3Di%2BYSc%3DSRj%2BTJTo_Dw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

