You will probably need a better binarization technique. See [1], [2]. [1]: https://groups.google.com/d/topic/tesseract-ocr/y-Yjxr1tRTQ/discussion [2]: https://groups.google.com/d/topic/tesseract-ocr/neyvXo2TAn0/discussion
Am Dienstag, 8. Juli 2014 07:31:39 UTC+2 schrieb Alex Ryan: > > I'm trying to make a words with friends cheat for a university project. > I'm obviously trying to OCR the tiles from a screen shot of the app. I have > tesseract 3.03 set up and running fine, but I'm not getting useable output. > I've tried various training methods but so far haven't hit upon the right > method and was hoping someone had some suggestions for me. > > Here's a sample image if you are unfamiliar with the program > > http://i.imgur.com/kAzXxJP.jpg > > I've trained tesseract using each tile as a letter of a new font. But that > doesnt seem to work, as it still sees the actual letter and number on the > tile as two different parts instead of as all part of the same letter. I > tried changing the "textord_min_linesize" as suggested in the FAQ for > solutions to diacritics, which would be a similar issue to what I'm having, > but if I input value higher than the default of 1.25 then it doesn't see > anything at all in the picture, I get a "Empty page!!". I've tried various > image pre processing and it hasn't helped either. > > Ideally id like to be able to differentiate between a normal "J" tile with > the small "10" in the top right corner (the score for that particular > letter) and a "J" tile without a number, as that means it was a "wild card" > tile in the game, as I would like to keep track of those. But if I have to > scrap that at this point I'm willing because I just want to get something > to work. Meaning if I could get Tesseract to ignore all the tiny numbers > and other noise and only read the letters I would be pleased. > > I also cant figure out how its scanning the image. Sometimes it goes top > to bottom right to left, and other times it seems to go left to right, top > to bottom. And sometimes it just seems to jump around. > > I know what I'm trying to do is possible as there are various marketplace > apps that accomplish this task, and some of them mention using Tesseract. I > just can't for the life of me figure out how. > > Sorry for the length of this post, I'm just desperate for any help and > want to make sure I express myself correctly. I've spent at least 30 hours > on this already, and while I have the whole training aspect down (which was > incredibly confusing to me when I first started), I still don't feel any > closer to actually having something useful, and the project deadline keeps > getting closer. > > My most humble and sincere thanks for any help or suggestions you may have. > > Cheers, > > Alex > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3af85920-849c-449a-94d1-6ae969f9b4cf%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

