thanks for your feedback! 

I was hoping to kinda not lock into one video game, so precision of where 
the high score may not be the same place will rule out cropping. I planned 
on doing a regex against whatever came back from tesseract. I was already 
counting on garbage information so there was going to be some light 
scripting wrapping this. 

But, when i cropped only to the "lower third" section of the mario 
screenshot, i was still not getting anything close to the score or time. 
Why is it struggling wit this font? it seems incredibly straight forward 
except that the "scores" are not a solid color with a border and sometimes 
they are touching. 

Since this is a new arena to me, can you point me in the right direction of 
researching how to do the "pixel-to-pixel" matching? 
And, I am new to the idea of training tesseract. can I train it to 
understand this font? 

This is more exploratory and fun for me, so I am very willing to learn the 
"correct way" of doing this. I just want to be pointed in the right 
direction. 

thanks again!!


On Thursday, April 23, 2015 at 2:51:55 AM UTC-7, Dmitri Silaev wrote:
>
> Hmmm, fixed image size, fixed region, constant colors, monospace raster 
> font... 
>
> Do you really want to engage a whole algorithmic monster to handle a 
> problem like this? Not to mention poor performance, training, 
> preprocessing, coping with all sorts of recognition problems is guaranteed.
>
> Pixel-to-pixel matching is the way to go! 
> 100% accuracy.
>
> Even if you not willing to resort to full fledged programming - just crop 
> out 10 digit samples and match them to your input image using a shell 
> script loop. Give your ImageMagick-fu a chance. Or, you can even use file 
> compare! ))
>
> HTH
>
> Best regards,
> Dmitri Silaev
> www.CustomOCR.com
>
>
>
>
>
> On Thu, Apr 23, 2015 at 9:05 AM, Leah Siddall <
> [email protected] <javascript:>> wrote:
>
>> Hi all! 
>>
>> I am not having luck with tesseract and the fonts used in NES games like 
>> Super Mario Bros. 3. ( i've attached an example screenshot ).
>> My goal is scrape a screenshot for the "score" and "time remaining". The 
>> idea is to feed that into a database during a competition to minimize 
>> cheating. 
>>
>> I've tried cropping, resizing, grayscale, and negating with PNG, TIF, 
>> JPG, and PNM formats then going through every PSM mode on each with poor 
>> results. 
>> The original screenshot is PNG 4800 × 3600 pixels at 144 pixels/inch 
>> straight from the emulator which is like the best possible situation. 
>>
>> Just trying to get a baseline, I tried against the "Punch Out" screenshot 
>> ( attached ) where the fonts are clearly spaced and lots of empty space. It 
>> would get "CDHTIHUE" and "Nintendo", but totally missing the word "new" 
>> between the boxing gloves and and jumbling the year numbers. 
>>
>> To rule out user error, I did run against other images with more standard 
>> fonts and had no problems. 
>>
>> I'm quite comfortable with imagemagick but very new to tesseract. 
>> I am using tesseract version from "brew install tesseract -HEAD" on 
>> OSX 10.10.2
>> tesseract 3.04.00
>>  leptonica-1.71
>>   libjpeg 8d : libpng 1.6.16 : libtiff 4.0.3 : zlib 1.2.5
>>
>> This would be really really cool to pull off if possible. any suggestions 
>> are greatly appreciated.
>> thanks!! -leah
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/2088977c-529b-45bd-8059-b6906fb666ce%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/2088977c-529b-45bd-8059-b6906fb666ce%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2b2cd6b7-73b3-4f48-b98f-70d3e7289e51%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to