Hi, I agree. A lot of image processing is about massaging your data.
- Albert -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of dythmall Sent: Friday, April 03, 2009 10:07 To: tesseract-ocr Subject: Re: Image Pre-Processing in Tesseract I think, as far as I know, Tesseract makes images into 2-bit black and white images. And it uses adaptive thresh hold method. Think the best is to do your own image pre-processing (which I did for my project) Before feeding it into tesseract I erased the background from the image. you should try doing that :) On Apr 3, 11:00 pm, paulfeakins <[email protected]> wrote: > I'm working on a project where my source tiff image may have > background colours or images behind the text. > > I've been able to train tesseract successfully with some other fonts, > which works very well, but the background does seem to confuse > tesseract a little. > > My question is, does tesseract perform any image pre-processing? If > not, is it worth me trying to apply a threshold or some other type of > optimization to the image first? > > I've had a brief look through the source code, but I'm not really a C+ > + developer so it was a bit hard to follow. What I'm trying to achieve > is something like reading text from a magazine where it's all printed > on top of a background image. > > I'm trying to find out what sort of image tesseract is actually > working on, as perhaps I could then train it with a more accurate > representation of the text it needs to recognise. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

