Indeed, for this very image it's easy: just run say Photoshop, crop to
ROI and do research of how to mix color channels in order for the text
to stand out clearly against the background. Then select a suitable
threshold value, and you're done. Then you should have no difficulties
to code that into your program. If you're lazy to code it yourself,
try to google around for those keywords.

The problem arises when you wish to make this algorithm to be fully
automated. All images you would pass to it can differ significantly in
many aspects. Then those fixed channel percentages and thresholds
won't suffice, you'll need to implement something more intelligent.

Warm regards,
Dmitri Silaev
www.CustomOCR.com





On Tue, Jun 21, 2011 at 10:24 AM, Felipe Leal Coutinho
<[email protected]> wrote:
> Hello,
>
> I'm try to use tesseract to make OCR of bank cheques captured from
> digital cameras. As you can see (http://dl.dropbox.com/u/24085540/
> cheque-exemplo.jpg), these documents have a black text with a color
> background (there isn't black color at the background). In order to
> improve the results, I think that I will need to make some pre-
> processing. Do you suggest something? I was thinking in remove the
> background, but I didn't found any method to do that.
>
> Regards,
>
> Felipe.
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to