Tesseract is very poor when recognizing images with a mixture of non- text blogs and text, especially when the non-text elements are larger than the text. In these instances it is likely to completely ignore the text. I suggest you do your own layout analysis and process sub- images one by one - preferably with text lines isolated from non-text. For example try to separate the player icon from the text under it - because Tesseract is also very poor when it has two lines of very different height: in that case it may not ignore the text below the large icon but it might get it all wrong because of bad height assumptions.
You will also need to upscale the subimages here (make them larger). Patrick On Jun 19, 5:18 pm, js2002 <[email protected]> wrote: > Hi, > > this is my result with teseract: > > http://dl.dropbox.com/u/31225678/Screenshot%20-%2017.06.2011%20%2C%20... > > the samples etc are read perfectly. > What do I do wrong? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

