Tesseract is very poor when recognizing images with a mixture of non-
text blogs and text, especially when the non-text elements are larger
than the text. In these instances it is likely to completely ignore
the text. I suggest you do your own layout analysis and process sub-
images one by one - preferably with text lines isolated from non-text.
For example try to separate the player icon from the text under it -
because Tesseract is also very poor when it has two lines of very
different height: in that case it may not ignore the text below the
large icon but it might get it all wrong because of bad height
assumptions.

You will also need to upscale the subimages here (make them larger).

Patrick

On Jun 19, 5:18 pm, js2002 <[email protected]> wrote:
> Hi,
>
> this is my result with teseract:
>
> http://dl.dropbox.com/u/31225678/Screenshot%20-%2017.06.2011%20%2C%20...
>
> the samples etc are read perfectly.
> What do I do wrong?

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to