On Tue, 17 Sep 2013, Malcolm Poole wrote:
[snip]
I made a test file using a red font and larger text and
tesseract was able to read it but not very well. The short
answer seems to be: Tesseract works better with grayscale
images.
I believe that tesseract converts everything to black
and white internally before doing the OCR. I think
it makes sense to do the conversion yourself before
passing it to tesseract, so that you know that it
looks right. There are various algorithms for
binarizing images, and it may be worth checking
out the archives here to find one that works well
with the kind of images you want to process. You
may also be able to key on certain colours and
ignore others. There are many ways to tailor the
binarization, and it may be possible to improve it
over the default scheme used by tesseract.
Cheers,
Rob Komar
--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.