Hello,

I've been trying some pre-processing techniques, etc. to get the best 
result with tesseract ocr. But I'm getting some errors on parts of data.
I'm using AForge libraries for C#.

The image I'm using is this one:

<https://lh3.googleusercontent.com/-0FtXGLwCKnE/Vboc6S32bHI/AAAAAAAADrc/HZ5Jmmsmfj0/s1600/IMG_20150724_192936.jpg>


These are the steps I'm doing right now:

1) Detect where the text starts and crop to avoid any issues when I 
binarize the image

2) Apply a Median filter (processing square size of 3)

3) Apply gaussian sharpen (0.6 sigma)

4) Apply brightness correction (6 adjustValue)

5) Apply contrast stretch

6) Apply contrast correction (3 factor)

7) Apply saturation correction

8) Convert image to grayscale

9) Apply gamma correction (0.85 gamma)

10) Apply a bradley local thresholding

11) Get skew angle
12) Apply opening filter

13) Rotate image with angle obtained from step 11

14) OCR


Result:

CAFETERIA ESCLAT

AV.LLuis Companys,s/n.— T]f.934 772 965 j

FCO.GONZALEZ CASTRO

SANT JOAN D' ESPI

NIF:09747552Z-IVA NCLUIDO

TICKET 617431

VIE 24 JUL 2015 16:07

Cant Descripcion P.U. Tatil

2 MENU iiiiiiiii 10.00 20.00

l CAFE CON LEUHE 1.30 1.30

1 CAFE 1.10 1.10

TOTAL 22.40

EFECTIVO 22.40

**BASE IMPONIBLE 20.36 IVA 10% 2.04 **

... GRACIAS POR SU VISITA ...

JERU CAJA 1




Is there anything I could do to improve the output?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/3e962c4e-a1e2-4be5-97a1-d40bbd4074e7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to