I have been getting great results from Tesseract when the images are clear. However, many of my images are crummy. How would you get the best results for this? Maybe improved training, maybe image pre-processing?
The original is like this: <https://lh5.googleusercontent.com/-Jz-VqLejc-U/VEEau_7k3oI/AAAAAAAAADU/bXOopmkgaSA/s1600/tessOriginal.png> I have done some GraphicsMagick work to get this: <https://lh3.googleusercontent.com/-wAD7kwouFUQ/VEEbBvCH4mI/AAAAAAAAADc/wEQagCMHyxk/s1600/tessAfterIM.png> I am using Ubuntu 14.04, and see this in the terminal: Tesseract Open Source OCR Engine v3.03 with Leptonica The Tesseract output text is, as expected, poor: uh it .0222? 1mm: (lenuimi 7.1: ft. tin“. Six'ori;v;sioxi ".nn’ ;(rt. n of an; 103 an Lam tmtns;hn RG 84; 37.14121 :13}: oven 1w ,4? 1 {2.2" “"D'ud<~‘1ii,xl,;l w ‘;’ tires LU. not, I,» ana'asienbnd in 1.3 arm apartment. .. em; mummy no film rzltnow‘n tau” 1m. and. ruijoiv‘zim: (mean in thin bleak. :an {"311 .25 33:03:) in Line :‘djomum Monks 1m :1: r a; w 231:9C101l3nt13‘ 1 are u.» rerun «S :2, vngfiunxulw CHUHLVlfiafln. ~.1. was tun iHEHHELgN MC -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8cf6020e-1dc3-499f-8de9-a09c8865939a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

