I'm trying to make a words with friends cheat for a university project. I'm 
obviously trying to OCR the tiles from a screen shot of the app. I have 
tesseract 3.03 set up and running fine, but I'm not getting useable output. 
I've tried various training methods but so far haven't hit upon the right 
method and was hoping someone had some suggestions for me.

Here's a sample image if you are unfamiliar with the program

http://i.imgur.com/kAzXxJP.jpg

I've trained tesseract using each tile as a letter of a new font. But that 
doesnt seem to work, as it still sees the actual letter and number on the 
tile as two different parts instead of as all part of the same letter. I 
tried changing the "textord_min_linesize" as suggested in the FAQ for 
solutions to diacritics, which would be a similar issue to what I'm having, 
but if I input value higher than the default of 1.25 then it doesn't see 
anything at all in the picture, I get a "Empty page!!". I've tried various 
image pre processing and it hasn't helped either.

Ideally id like to be able to differentiate between a normal "J" tile with 
the small "10" in the top right corner (the score for that particular 
letter) and a "J" tile without a number, as that means it was a "wild card" 
tile in the game, as I would like to keep track of those. But if I have to 
scrap that at this point I'm willing because I just want to get something 
to work. Meaning if I could get Tesseract to ignore all the tiny numbers 
and other noise and only read the letters I would be pleased.

I also cant figure out how its scanning the image. Sometimes it goes top to 
bottom right to left, and other times it seems to go left to right, top to 
bottom. And sometimes it just seems to jump around.

I know what I'm trying to do is possible as there are various marketplace 
apps that accomplish this task, and some of them mention using Tesseract. I 
just can't for the life of me figure out how.

Sorry for the length of this post, I'm just desperate for any help and want 
to make sure I express myself correctly. I've spent at least 30 hours on 
this already, and while I have the whole training aspect down (which was 
incredibly confusing to me when I first started), I still don't feel any 
closer to actually having something useful, and the project deadline keeps 
getting closer.

My most humble and sincere thanks for any help or suggestions you may have.

Cheers,

Alex

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a41b7a57-6c3c-45f2-9bb9-15f6320a8a3e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to