Hello, I've been trying some pre-processing techniques, etc. to get the best result with tesseract ocr. But I'm getting some errors on parts of data. I'm using AForge libraries for C#.
The image I'm using is this one: <https://lh3.googleusercontent.com/-0FtXGLwCKnE/Vboc6S32bHI/AAAAAAAADrc/HZ5Jmmsmfj0/s1600/IMG_20150724_192936.jpg> These are the steps I'm doing right now: 1) Detect where the text starts and crop to avoid any issues when I binarize the image 2) Apply a Median filter (processing square size of 3) 3) Apply gaussian sharpen (0.6 sigma) 4) Apply brightness correction (6 adjustValue) 5) Apply contrast stretch 6) Apply contrast correction (3 factor) 7) Apply saturation correction 8) Convert image to grayscale 9) Apply gamma correction (0.85 gamma) 10) Apply a bradley local thresholding 11) Get skew angle 12) Apply opening filter 13) Rotate image with angle obtained from step 11 14) OCR Result: CAFETERIA ESCLAT AV.LLuis Companys,s/n.— T]f.934 772 965 j FCO.GONZALEZ CASTRO SANT JOAN D' ESPI NIF:09747552Z-IVA NCLUIDO TICKET 617431 VIE 24 JUL 2015 16:07 Cant Descripcion P.U. Tatil 2 MENU iiiiiiiii 10.00 20.00 l CAFE CON LEUHE 1.30 1.30 1 CAFE 1.10 1.10 TOTAL 22.40 EFECTIVO 22.40 **BASE IMPONIBLE 20.36 IVA 10% 2.04 ** ... GRACIAS POR SU VISITA ... JERU CAJA 1 Is there anything I could do to improve the output? Thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3e962c4e-a1e2-4be5-97a1-d40bbd4074e7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

