<https://lh3.googleusercontent.com/-onaaH-CgL5Q/V9nlrH9pfiI/AAAAAAAAAAc/iTy5cGT2lqgQhZn2nhGOINO4e7T31zj8wCLcB/s1600/Sample.png>
I'm not sure if I'm supposed to reply to my own question, but I figured I'd 
share some progress in case someone can give me an idea to go from here. I 
changed the image to grayscale and then used win32api to get the color of 
the pixels in the words (probably should've just used paint.net). I've seen 
people say it does better with black text on white backgrounds so I made a 
function that checks each pixel to see if it's within a tolerance of the 
value I got for the words, this is because the words seem to have a weird 
8-bit bevel effect so the pixels are lighter in the center of symbols and 
darker towards the sides. When a pixel fell in the tolerance I changed it's 
value to pure black, and all pixels than didn't were changed to white. I 
also scaled the image size up to 4x it's original size. Which helped with 
the reading a bit.

The results of the Tesseract scan were:
剛坤剛艶

縄〔廿鵬 Ti…曹
B震廿ー尋 鵬〔 2

典肛亦立叫匠
ー席 叫換立『

      

技~ アイテムをえらぶウィンドウを
問いていると~ 時間が止まります。
ゅっ くりと華糞りゃくを祖りながら
バ ト ルがで耆ます ゥ

A worry I have now is that the "Tolerance" I had was 65! Grayscale only has 
an intensity value between 0 and 255 as I understand it. So in order to get 
the words to show up I needed to look for pixels in a range of 130 just 
over half the intensity spectrum. If I were to move from this to a more 
busy screen that had sprites on it I'd imagine a bunch of pixels would be 
turned black while trying to get the writing to clear up and separating the 
actual text reading from the noisy garbage you get trying to read a black 
splotch is going to be hard. Is there any way to do this?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/80e7ec61-e3e5-4cf0-ad26-2ad6a5e028f7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to