Dear Zvezdoslav Kunov,
I have some ideas for preprocessing:
1. Apply thresholding image, analyze two simple method:
- static threshold: keep pixels have lower intensity
- adaptive threshold
2. Do connected component
- filter objects/clusters based on boundary
3. Based-on median of objects/clusters boundary, calculate scale
factor (depend on trained character size) and apply scaling image
After that, I think we should get good results.
Cong.
P/S: here are illustrations about the approach:
extracted ROI (I cropped manually :)):
https://picasaweb.google.com/congnguyenba/TesseractBasedOCR#5576335049023069730
scaled image:
https://picasaweb.google.com/congnguyenba/TesseractBasedOCR#5576335048933180546
tesseract ocr recognition result for scaled image:
https://picasaweb.google.com/congnguyenba/TesseractBasedOCR#5576335050086804194
You can find simple application at: http://code.google.com/p/tesseractdotnet/
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.