I'm trying to make a words with friends cheat for a university project. I'm obviously trying to OCR the tiles from a screen shot of the app. I have tesseract 3.03 set up and running fine, but I'm not getting useable output. I've tried various training methods but so far haven't hit upon the right method and was hoping someone had some suggestions for me.
Here's a sample image if you are unfamiliar with the program http://i.imgur.com/kAzXxJP.jpg I've trained tesseract using each tile as a letter of a new font. But that doesnt seem to work, as it still sees the actual letter and number on the tile as two different parts instead of as all part of the same letter. I tried changing the "textord_min_linesize" as suggested in the FAQ for solutions to diacritics, which would be a similar issue to what I'm having, but if I input value higher than the default of 1.25 then it doesn't see anything at all in the picture, I get a "Empty page!!". I've tried various image pre processing and it hasn't helped either. Ideally id like to be able to differentiate between a normal "J" tile with the small "10" in the top right corner (the score for that particular letter) and a "J" tile without a number, as that means it was a "wild card" tile in the game, as I would like to keep track of those. But if I have to scrap that at this point I'm willing because I just want to get something to work. Meaning if I could get Tesseract to ignore all the tiny numbers and other noise and only read the letters I would be pleased. I also cant figure out how its scanning the image. Sometimes it goes top to bottom right to left, and other times it seems to go left to right, top to bottom. And sometimes it just seems to jump around. I know what I'm trying to do is possible as there are various marketplace apps that accomplish this task, and some of them mention using Tesseract. I just can't for the life of me figure out how. Sorry for the length of this post, I'm just desperate for any help and want to make sure I express myself correctly. I've spent at least 30 hours on this already, and while I have the whole training aspect down (which was incredibly confusing to me when I first started), I still don't feel any closer to actually having something useful, and the project deadline keeps getting closer. My most humble and sincere thanks for any help or suggestions you may have. Cheers, Alex -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a41b7a57-6c3c-45f2-9bb9-15f6320a8a3e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.