You will probably need a better binarization technique. See [1], [2].

[1]: https://groups.google.com/d/topic/tesseract-ocr/y-Yjxr1tRTQ/discussion
[2]: https://groups.google.com/d/topic/tesseract-ocr/neyvXo2TAn0/discussion

Am Dienstag, 8. Juli 2014 07:31:39 UTC+2 schrieb Alex Ryan:
>
> I'm trying to make a words with friends cheat for a university project. 
> I'm obviously trying to OCR the tiles from a screen shot of the app. I have 
> tesseract 3.03 set up and running fine, but I'm not getting useable output. 
> I've tried various training methods but so far haven't hit upon the right 
> method and was hoping someone had some suggestions for me.
>
> Here's a sample image if you are unfamiliar with the program
>
> http://i.imgur.com/kAzXxJP.jpg
>
> I've trained tesseract using each tile as a letter of a new font. But that 
> doesnt seem to work, as it still sees the actual letter and number on the 
> tile as two different parts instead of as all part of the same letter. I 
> tried changing the "textord_min_linesize" as suggested in the FAQ for 
> solutions to diacritics, which would be a similar issue to what I'm having, 
> but if I input value higher than the default of 1.25 then it doesn't see 
> anything at all in the picture, I get a "Empty page!!". I've tried various 
> image pre processing and it hasn't helped either.
>
> Ideally id like to be able to differentiate between a normal "J" tile with 
> the small "10" in the top right corner (the score for that particular 
> letter) and a "J" tile without a number, as that means it was a "wild card" 
> tile in the game, as I would like to keep track of those. But if I have to 
> scrap that at this point I'm willing because I just want to get something 
> to work. Meaning if I could get Tesseract to ignore all the tiny numbers 
> and other noise and only read the letters I would be pleased.
>
> I also cant figure out how its scanning the image. Sometimes it goes top 
> to bottom right to left, and other times it seems to go left to right, top 
> to bottom. And sometimes it just seems to jump around.
>
> I know what I'm trying to do is possible as there are various marketplace 
> apps that accomplish this task, and some of them mention using Tesseract. I 
> just can't for the life of me figure out how.
>
> Sorry for the length of this post, I'm just desperate for any help and 
> want to make sure I express myself correctly. I've spent at least 30 hours 
> on this already, and while I have the whole training aspect down (which was 
> incredibly confusing to me when I first started), I still don't feel any 
> closer to actually having something useful, and the project deadline keeps 
> getting closer.
>
> My most humble and sincere thanks for any help or suggestions you may have.
>
> Cheers,
>
> Alex
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/3af85920-849c-449a-94d1-6ae969f9b4cf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to