thanks for your feedback! I was hoping to kinda not lock into one video game, so precision of where the high score may not be the same place will rule out cropping. I planned on doing a regex against whatever came back from tesseract. I was already counting on garbage information so there was going to be some light scripting wrapping this.
But, when i cropped only to the "lower third" section of the mario screenshot, i was still not getting anything close to the score or time. Why is it struggling wit this font? it seems incredibly straight forward except that the "scores" are not a solid color with a border and sometimes they are touching. Since this is a new arena to me, can you point me in the right direction of researching how to do the "pixel-to-pixel" matching? And, I am new to the idea of training tesseract. can I train it to understand this font? This is more exploratory and fun for me, so I am very willing to learn the "correct way" of doing this. I just want to be pointed in the right direction. thanks again!! On Thursday, April 23, 2015 at 2:51:55 AM UTC-7, Dmitri Silaev wrote: > > Hmmm, fixed image size, fixed region, constant colors, monospace raster > font... > > Do you really want to engage a whole algorithmic monster to handle a > problem like this? Not to mention poor performance, training, > preprocessing, coping with all sorts of recognition problems is guaranteed. > > Pixel-to-pixel matching is the way to go! > 100% accuracy. > > Even if you not willing to resort to full fledged programming - just crop > out 10 digit samples and match them to your input image using a shell > script loop. Give your ImageMagick-fu a chance. Or, you can even use file > compare! )) > > HTH > > Best regards, > Dmitri Silaev > www.CustomOCR.com > > > > > > On Thu, Apr 23, 2015 at 9:05 AM, Leah Siddall < > [email protected] <javascript:>> wrote: > >> Hi all! >> >> I am not having luck with tesseract and the fonts used in NES games like >> Super Mario Bros. 3. ( i've attached an example screenshot ). >> My goal is scrape a screenshot for the "score" and "time remaining". The >> idea is to feed that into a database during a competition to minimize >> cheating. >> >> I've tried cropping, resizing, grayscale, and negating with PNG, TIF, >> JPG, and PNM formats then going through every PSM mode on each with poor >> results. >> The original screenshot is PNG 4800 × 3600 pixels at 144 pixels/inch >> straight from the emulator which is like the best possible situation. >> >> Just trying to get a baseline, I tried against the "Punch Out" screenshot >> ( attached ) where the fonts are clearly spaced and lots of empty space. It >> would get "CDHTIHUE" and "Nintendo", but totally missing the word "new" >> between the boxing gloves and and jumbling the year numbers. >> >> To rule out user error, I did run against other images with more standard >> fonts and had no problems. >> >> I'm quite comfortable with imagemagick but very new to tesseract. >> I am using tesseract version from "brew install tesseract -HEAD" on >> OSX 10.10.2 >> tesseract 3.04.00 >> leptonica-1.71 >> libjpeg 8d : libpng 1.6.16 : libtiff 4.0.3 : zlib 1.2.5 >> >> This would be really really cool to pull off if possible. any suggestions >> are greatly appreciated. >> thanks!! -leah >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at http://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/2088977c-529b-45bd-8059-b6906fb666ce%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/2088977c-529b-45bd-8059-b6906fb666ce%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2b2cd6b7-73b3-4f48-b98f-70d3e7289e51%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

