Re: Numbers & Noise

Cong Nguyen Mon, 21 Feb 2011 19:03:46 -0800

Dear Zvezdoslav Kunov,

I have some ideas for preprocessing:


1. Apply thresholding image, analyze two simple method:
    - static threshold: keep pixels have lower intensity
    - adaptive threshold

2. Do connected component
    - filter objects/clusters based on boundary

3. Based-on median of objects/clusters boundary, calculate scale
factor (depend on trained character size) and apply scaling image

After that, I think we should get good results.

Cong.

P/S: here are illustrations about the approach:
extracted ROI (I cropped manually :)):
https://picasaweb.google.com/congnguyenba/TesseractBasedOCR#5576335049023069730
scaled image: 
https://picasaweb.google.com/congnguyenba/TesseractBasedOCR#5576335048933180546
tesseract ocr recognition result for scaled image:
https://picasaweb.google.com/congnguyenba/TesseractBasedOCR#5576335050086804194
You can find simple application at: http://code.google.com/p/tesseractdotnet/

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: Numbers & Noise

Reply via email to