Hello good people! I've been having a go at tesseract this week and
I've come across the following issue,
For some reason any document with Times New Roman (printed from
Microsoft Word 2007 or WordPad from windows 7) after being OCR'd
results in only gibberish characters :-( I've been following the
training instructions but after making the initial training box, the
only contents of the box (when opening the boxfile with a text editor)
are gibberish as well.

I really can't find how to resolve this so I thought I'd ask here.
Does anyone know how to fix this particular issue?

Here's my TIF file if that helps: 
http://dev.ruben.hypotheekbond.nl/images/document_38.tif
(31 megabytes)

Thanks for the help!

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to