Also, I wanted to show the output from the lower third: <https://lh3.googleusercontent.com/-c527dWolCTI/VTlMFuVJG2I/AAAAAAAAATg/i7ELL6UpHFY/s1600/mario_lower.png>
E53333 I-I-I-I-I-|--E.|- $5 a]. ICED}: E]- EIEIEIEEHEIEJ GEE-'3 as you can see, i'm not even getting numbers. :/ On Thursday, April 23, 2015 at 12:28:14 PM UTC-7, Leah Siddall wrote: > > thanks for your feedback! > > I was hoping to kinda not lock into one video game, so precision of where > the high score may not be the same place will rule out cropping. I planned > on doing a regex against whatever came back from tesseract. I was already > counting on garbage information so there was going to be some light > scripting wrapping this. > > But, when i cropped only to the "lower third" section of the mario > screenshot, i was still not getting anything close to the score or time. > Why is it struggling wit this font? it seems incredibly straight forward > except that the "scores" are not a solid color with a border and sometimes > they are touching. > > Since this is a new arena to me, can you point me in the right direction > of researching how to do the "pixel-to-pixel" matching? > And, I am new to the idea of training tesseract. can I train it to > understand this font? > > This is more exploratory and fun for me, so I am very willing to learn the > "correct way" of doing this. I just want to be pointed in the right > direction. > > thanks again!! > > > On Thursday, April 23, 2015 at 2:51:55 AM UTC-7, Dmitri Silaev wrote: >> >> Hmmm, fixed image size, fixed region, constant colors, monospace raster >> font... >> >> Do you really want to engage a whole algorithmic monster to handle a >> problem like this? Not to mention poor performance, training, >> preprocessing, coping with all sorts of recognition problems is guaranteed. >> >> Pixel-to-pixel matching is the way to go! >> 100% accuracy. >> >> Even if you not willing to resort to full fledged programming - just crop >> out 10 digit samples and match them to your input image using a shell >> script loop. Give your ImageMagick-fu a chance. Or, you can even use file >> compare! )) >> >> HTH >> >> Best regards, >> Dmitri Silaev >> www.CustomOCR.com >> >> >> >> >> >> On Thu, Apr 23, 2015 at 9:05 AM, Leah Siddall < >> [email protected]> wrote: >> >>> Hi all! >>> >>> I am not having luck with tesseract and the fonts used in NES games like >>> Super Mario Bros. 3. ( i've attached an example screenshot ). >>> My goal is scrape a screenshot for the "score" and "time remaining". The >>> idea is to feed that into a database during a competition to minimize >>> cheating. >>> >>> I've tried cropping, resizing, grayscale, and negating with PNG, TIF, >>> JPG, and PNM formats then going through every PSM mode on each with poor >>> results. >>> The original screenshot is PNG 4800 × 3600 pixels at 144 pixels/inch >>> straight from the emulator which is like the best possible situation. >>> >>> Just trying to get a baseline, I tried against the "Punch Out" >>> screenshot ( attached ) where the fonts are clearly spaced and lots of >>> empty space. It would get "CDHTIHUE" and "Nintendo", but totally missing >>> the word "new" between the boxing gloves and and jumbling the year numbers. >>> >>> To rule out user error, I did run against other images with more >>> standard fonts and had no problems. >>> >>> I'm quite comfortable with imagemagick but very new to tesseract. >>> I am using tesseract version from "brew install tesseract -HEAD" on >>> OSX 10.10.2 >>> tesseract 3.04.00 >>> leptonica-1.71 >>> libjpeg 8d : libpng 1.6.16 : libtiff 4.0.3 : zlib 1.2.5 >>> >>> This would be really really cool to pull off if possible. any >>> suggestions are greatly appreciated. >>> thanks!! -leah >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/2088977c-529b-45bd-8059-b6906fb666ce%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/2088977c-529b-45bd-8059-b6906fb666ce%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f84a83cc-0ac0-45f9-b487-0c98537848ac%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

