[tesseract-ocr] how can I get better results for this

Rick Leir Fri, 17 Oct 2014 08:06:42 -0700

I have been getting great results from Tesseract when the images are clear. 
However, many of my images are crummy.
How would you get the best results for this? Maybe improved training, maybe 
image pre-processing?


The original is like this:


<https://lh5.googleusercontent.com/-Jz-VqLejc-U/VEEau_7k3oI/AAAAAAAAADU/bXOopmkgaSA/s1600/tessOriginal.png>
I have done some GraphicsMagick work to get this:


<https://lh3.googleusercontent.com/-wAD7kwouFUQ/VEEbBvCH4mI/AAAAAAAAADc/wEQagCMHyxk/s1600/tessAfterIM.png>


I am using Ubuntu 14.04, and see this in the terminal:
   Tesseract Open Source OCR Engine v3.03 with Leptonica

The Tesseract output text is, as expected, poor:

uh it .0222? 1mm: (lenuimi 7.1: ft. tin“. Six'ori;v;sioxi ".nn’

;(rt. n of an; 103 an Lam tmtns;hn

RG 84;

37.14121 :13}: oven 1w ,4? 1 {2.2" “"D'ud<~‘1ii,xl,;l w ‘;’ tires LU.

not, I,» ana'asienbnd in 1.3 arm apartment. .. em; mummy no film rzltnow‘n 
tau” 1m. and. ruijoiv‘zim: (mean in thin bleak. :an {"311 .25 33:03:) in 
Line :‘djomum Monks 1m :1: r a; w 231:9C101l3nt13‘ 1 are u.» rerun «S :2, 
vngﬁunxulw CHUHLVlﬁaﬂn. ~.1. was tun iHEHHELgN MC


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8cf6020e-1dc3-499f-8de9-a09c8865939a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] how can I get better results for this

Reply via email to