*mind blown* this is a much better approach!! especially how quickly i found something like this: http://www.mfgg.net/index.php?act=resdb¶m=03&c=1&id=5425
<https://lh3.googleusercontent.com/-kPwR92wxmRc/VTlovAicNCI/AAAAAAAAAT0/HSUlckQHM1I/s1600/smb_fonts.png> There will be a learning curve, but I agree this will be a much more accurate approach. The link you sent me is perfect for understanding the theory and possible workflow. Would you happen to have have another project like tesseract ( linux/osx based ) i could investigate to use for this purpose? and thank you very much for shifting my attention away from OCR. NES games can only have some many palettes ( which you can easily extract ) and restricted to certain sizes. so this should be easy to create a matching library by hand. On Thursday, April 23, 2015 at 1:53:29 PM UTC-7, Dmitri Silaev wrote: > > Don't waste your time with Tesseract here, I tell ya. You'd only get all > sorts of unnecessary hassle. And what's most important, you'll be > frustrated by accuracy. > > By "pixel-to-pixel" I mean what is described e.g. here, section "Naive > Template Matching": > > http://docs.adaptive-vision.com/current/studio/machine_vision_guide/TemplateMatching.html > > But in your case that wouldn't be dumb iteration over the entire image, > but a single check in a fixed location whether the template image has > exactly same pixels as the input image. You can arrange it like this: > - Crop out samples of all digits (each sized 85x60) -> digit0.png .. > digit9.png > - Crop out the same sized rectangle from a fixed location of your source > image - e.g. score digit #0 -> score0.png > - Do file compare score0.png to digit0.png > - If no match - try digit1.png > ... > - Match found - this is your score digit #0 > - Take next score digit > ... > - Proceed to time digits > ... > - Done > > Simple! > > Above approach probably would adapt for other games, and you'd manage to > use same digit samples. > File compare might be replaced by XOR and then calculating the mean of all > pixels (should be 0 if match). > There can be other methods of comparison. You get the point. > > You'd better invest your time into accumulating a collection of score > digit coordinates in each game, than into a struggle with quirky OCR > results. > > Well, unless you're eager to. > > Best regards, > Dmitri Silaev > www.CustomOCR.com > > > > > > On Thu, Apr 23, 2015 at 10:51 PM, Leah Siddall < > [email protected] <javascript:>> wrote: > >> Also, I wanted to show the output from the lower third: >> >> >> <https://lh3.googleusercontent.com/-c527dWolCTI/VTlMFuVJG2I/AAAAAAAAATg/i7ELL6UpHFY/s1600/mario_lower.png> >> >> E53333 I-I-I-I-I-|--E.|- $5 a]. >> ICED}: E]- EIEIEIEEHEIEJ GEE-'3 >> >> >> as you can see, i'm not even getting numbers. :/ >> >> >> On Thursday, April 23, 2015 at 12:28:14 PM UTC-7, Leah Siddall wrote: >>> >>> thanks for your feedback! >>> >>> I was hoping to kinda not lock into one video game, so precision of >>> where the high score may not be the same place will rule out cropping. I >>> planned on doing a regex against whatever came back from tesseract. I was >>> already counting on garbage information so there was going to be some light >>> scripting wrapping this. >>> >>> But, when i cropped only to the "lower third" section of the mario >>> screenshot, i was still not getting anything close to the score or time. >>> Why is it struggling wit this font? it seems incredibly straight forward >>> except that the "scores" are not a solid color with a border and sometimes >>> they are touching. >>> >>> Since this is a new arena to me, can you point me in the right direction >>> of researching how to do the "pixel-to-pixel" matching? >>> And, I am new to the idea of training tesseract. can I train it to >>> understand this font? >>> >>> This is more exploratory and fun for me, so I am very willing to learn >>> the "correct way" of doing this. I just want to be pointed in the right >>> direction. >>> >>> thanks again!! >>> >>> >>> On Thursday, April 23, 2015 at 2:51:55 AM UTC-7, Dmitri Silaev wrote: >>>> >>>> Hmmm, fixed image size, fixed region, constant colors, monospace raster >>>> font... >>>> >>>> Do you really want to engage a whole algorithmic monster to handle a >>>> problem like this? Not to mention poor performance, training, >>>> preprocessing, coping with all sorts of recognition problems is guaranteed. >>>> >>>> Pixel-to-pixel matching is the way to go! >>>> 100% accuracy. >>>> >>>> Even if you not willing to resort to full fledged programming - just >>>> crop out 10 digit samples and match them to your input image using a shell >>>> script loop. Give your ImageMagick-fu a chance. Or, you can even use file >>>> compare! )) >>>> >>>> HTH >>>> >>>> Best regards, >>>> Dmitri Silaev >>>> www.CustomOCR.com >>>> >>>> >>>> >>>> >>>> >>>> On Thu, Apr 23, 2015 at 9:05 AM, Leah Siddall < >>>> [email protected]> wrote: >>>> >>>>> Hi all! >>>>> >>>>> I am not having luck with tesseract and the fonts used in NES games >>>>> like Super Mario Bros. 3. ( i've attached an example screenshot ). >>>>> My goal is scrape a screenshot for the "score" and "time remaining". >>>>> The idea is to feed that into a database during a competition to minimize >>>>> cheating. >>>>> >>>>> I've tried cropping, resizing, grayscale, and negating with PNG, TIF, >>>>> JPG, and PNM formats then going through every PSM mode on each with poor >>>>> results. >>>>> The original screenshot is PNG 4800 × 3600 pixels at 144 pixels/inch >>>>> straight from the emulator which is like the best possible situation. >>>>> >>>>> Just trying to get a baseline, I tried against the "Punch Out" >>>>> screenshot ( attached ) where the fonts are clearly spaced and lots of >>>>> empty space. It would get "CDHTIHUE" and "Nintendo", but totally missing >>>>> the word "new" between the boxing gloves and and jumbling the year >>>>> numbers. >>>>> >>>>> To rule out user error, I did run against other images with more >>>>> standard fonts and had no problems. >>>>> >>>>> I'm quite comfortable with imagemagick but very new to tesseract. >>>>> I am using tesseract version from "brew install tesseract -HEAD" on >>>>> OSX 10.10.2 >>>>> tesseract 3.04.00 >>>>> leptonica-1.71 >>>>> libjpeg 8d : libpng 1.6.16 : libtiff 4.0.3 : zlib 1.2.5 >>>>> >>>>> This would be really really cool to pull off if possible. any >>>>> suggestions are greatly appreciated. >>>>> thanks!! -leah >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/2088977c-529b-45bd-8059-b6906fb666ce%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/2088977c-529b-45bd-8059-b6906fb666ce%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at http://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/f84a83cc-0ac0-45f9-b487-0c98537848ac%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/f84a83cc-0ac0-45f9-b487-0c98537848ac%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ee4251eb-36fd-44b5-8100-e1d985793536%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

