Hmmm, fixed image size, fixed region, constant colors, monospace raster font...
Do you really want to engage a whole algorithmic monster to handle a problem like this? Not to mention poor performance, training, preprocessing, coping with all sorts of recognition problems is guaranteed. Pixel-to-pixel matching is the way to go! 100% accuracy. Even if you not willing to resort to full fledged programming - just crop out 10 digit samples and match them to your input image using a shell script loop. Give your ImageMagick-fu a chance. Or, you can even use file compare! )) HTH Best regards, Dmitri Silaev www.CustomOCR.com On Thu, Apr 23, 2015 at 9:05 AM, Leah Siddall < [email protected]> wrote: > Hi all! > > I am not having luck with tesseract and the fonts used in NES games like > Super Mario Bros. 3. ( i've attached an example screenshot ). > My goal is scrape a screenshot for the "score" and "time remaining". The > idea is to feed that into a database during a competition to minimize > cheating. > > I've tried cropping, resizing, grayscale, and negating with PNG, TIF, JPG, > and PNM formats then going through every PSM mode on each with poor > results. > The original screenshot is PNG 4800 × 3600 pixels at 144 pixels/inch > straight from the emulator which is like the best possible situation. > > Just trying to get a baseline, I tried against the "Punch Out" screenshot > ( attached ) where the fonts are clearly spaced and lots of empty space. It > would get "CDHTIHUE" and "Nintendo", but totally missing the word "new" > between the boxing gloves and and jumbling the year numbers. > > To rule out user error, I did run against other images with more standard > fonts and had no problems. > > I'm quite comfortable with imagemagick but very new to tesseract. > I am using tesseract version from "brew install tesseract -HEAD" on > OSX 10.10.2 > tesseract 3.04.00 > leptonica-1.71 > libjpeg 8d : libpng 1.6.16 : libtiff 4.0.3 : zlib 1.2.5 > > This would be really really cool to pull off if possible. any suggestions > are greatly appreciated. > thanks!! -leah > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/2088977c-529b-45bd-8059-b6906fb666ce%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/2088977c-529b-45bd-8059-b6906fb666ce%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAKzLxFMxERneM3ufi7FA0xx7YV3CUTmpKzvj8Sp%2B_p6%3DQT64%2BQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

