lol thanks Albert, now I know :) Thanks dythmall, I'd thought that might be the case. I did some tests and found that by selecting a specific area that I know will contain a certain number of characters, I can apply my own adaptive threshold based on the density of black pixels I'd expect. So far it's increased the accuracy quite a bit! Next I'm planning on training tesseract based on the black and white images my threshold creates rather than the actual font being used. Hopefully if I train it on more realistic data it will be even more accurate.
I've been trying to think of ways to remove the background, but it needs to be automated. If I had a copy of the background image without the text on, I could combine them using a difference filter and hey presto the text would pop out on its own. Thanks again for the reply! --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

