Also, I wanted to show the output from the lower third: 

<https://lh3.googleusercontent.com/-c527dWolCTI/VTlMFuVJG2I/AAAAAAAAATg/i7ELL6UpHFY/s1600/mario_lower.png>

E53333 I-I-I-I-I-|--E.|- $5 a].
ICED}: E]- EIEIEIEEHEIEJ GEE-'3


as you can see, i'm not even getting numbers. :/


On Thursday, April 23, 2015 at 12:28:14 PM UTC-7, Leah Siddall wrote:
>
> thanks for your feedback! 
>
> I was hoping to kinda not lock into one video game, so precision of where 
> the high score may not be the same place will rule out cropping. I planned 
> on doing a regex against whatever came back from tesseract. I was already 
> counting on garbage information so there was going to be some light 
> scripting wrapping this. 
>
> But, when i cropped only to the "lower third" section of the mario 
> screenshot, i was still not getting anything close to the score or time. 
> Why is it struggling wit this font? it seems incredibly straight forward 
> except that the "scores" are not a solid color with a border and sometimes 
> they are touching. 
>
> Since this is a new arena to me, can you point me in the right direction 
> of researching how to do the "pixel-to-pixel" matching? 
> And, I am new to the idea of training tesseract. can I train it to 
> understand this font? 
>
> This is more exploratory and fun for me, so I am very willing to learn the 
> "correct way" of doing this. I just want to be pointed in the right 
> direction. 
>
> thanks again!!
>
>
> On Thursday, April 23, 2015 at 2:51:55 AM UTC-7, Dmitri Silaev wrote:
>>
>> Hmmm, fixed image size, fixed region, constant colors, monospace raster 
>> font... 
>>
>> Do you really want to engage a whole algorithmic monster to handle a 
>> problem like this? Not to mention poor performance, training, 
>> preprocessing, coping with all sorts of recognition problems is guaranteed.
>>
>> Pixel-to-pixel matching is the way to go! 
>> 100% accuracy.
>>
>> Even if you not willing to resort to full fledged programming - just crop 
>> out 10 digit samples and match them to your input image using a shell 
>> script loop. Give your ImageMagick-fu a chance. Or, you can even use file 
>> compare! ))
>>
>> HTH
>>
>> Best regards,
>> Dmitri Silaev
>> www.CustomOCR.com
>>
>>
>>
>>
>>
>> On Thu, Apr 23, 2015 at 9:05 AM, Leah Siddall <
>> [email protected]> wrote:
>>
>>> Hi all! 
>>>
>>> I am not having luck with tesseract and the fonts used in NES games like 
>>> Super Mario Bros. 3. ( i've attached an example screenshot ).
>>> My goal is scrape a screenshot for the "score" and "time remaining". The 
>>> idea is to feed that into a database during a competition to minimize 
>>> cheating. 
>>>
>>> I've tried cropping, resizing, grayscale, and negating with PNG, TIF, 
>>> JPG, and PNM formats then going through every PSM mode on each with poor 
>>> results. 
>>> The original screenshot is PNG 4800 × 3600 pixels at 144 pixels/inch 
>>> straight from the emulator which is like the best possible situation. 
>>>
>>> Just trying to get a baseline, I tried against the "Punch Out" 
>>> screenshot ( attached ) where the fonts are clearly spaced and lots of 
>>> empty space. It would get "CDHTIHUE" and "Nintendo", but totally missing 
>>> the word "new" between the boxing gloves and and jumbling the year numbers. 
>>>
>>> To rule out user error, I did run against other images with more 
>>> standard fonts and had no problems. 
>>>
>>> I'm quite comfortable with imagemagick but very new to tesseract. 
>>> I am using tesseract version from "brew install tesseract -HEAD" on 
>>> OSX 10.10.2
>>> tesseract 3.04.00
>>>  leptonica-1.71
>>>   libjpeg 8d : libpng 1.6.16 : libtiff 4.0.3 : zlib 1.2.5
>>>
>>> This would be really really cool to pull off if possible. any 
>>> suggestions are greatly appreciated.
>>> thanks!! -leah
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/2088977c-529b-45bd-8059-b6906fb666ce%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/2088977c-529b-45bd-8059-b6906fb666ce%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f84a83cc-0ac0-45f9-b487-0c98537848ac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to