I think, as far as I know, Tesseract makes images into 2-bit black and
white images.
And it uses adaptive thresh hold method.
Think the best is to do your own image pre-processing (which I did for
my project)
Before feeding it into tesseract I erased the background from the
image.
you should try doing that :)

On Apr 3, 11:00 pm, paulfeakins <[email protected]> wrote:
> I'm working on a project where my source tiff image may have
> background colours or images behind the text.
>
> I've been able to train tesseract successfully with some other fonts,
> which works very well, but the background does seem to confuse
> tesseract a little.
>
> My question is, does tesseract perform any image pre-processing? If
> not, is it worth me trying to apply a threshold or some other type of
> optimization to the image first?
>
> I've had a brief look through the source code, but I'm not really a C+
> + developer so it was a bit hard to follow. What I'm trying to achieve
> is something like reading text from a magazine where it's all printed
> on top of a background image.
>
> I'm trying to find out what sort of image tesseract is actually
> working on, as perhaps I could then train it with a more accurate
> representation of the text it needs to recognise.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to