[tesseract-ocr] Text output vs. PDF

Tobias Fritz Sat, 20 Jun 2015 13:46:15 -0700

Hi,

I'm using tesseract 3.04 on OSX. It works very well but I'm having troubles 
with searchable PDF output.


I tried running tesseract on a tif file and created pdf output. The text 
file that is also created is almost accurate except for some little 
glitches. However, the text overlay in the pdf is not. In one place it 
inserted spaces in the words like this: t h i s  i s  a n  e x a m p l e.

In another place it removed all the spaces like this: thisisanexample.

What's the reason for this when the text file is almost perfect? How can I 
avoid this behavior?

Many thanks for any advice,

Tobias

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/cc177064-a074-4ebf-88d2-438917eff486%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Text output vs. PDF

Reply via email to