This is not really an answer. I would experiment with a higher resolution 
image.  And maybe experiment with masking the image using graphicsmagick.  
The mask would cover the 'ms', 'Mbps', and second 'Mbps'. Good luck!

On Wednesday, June 3, 2015 at 2:10:07 AM UTC-4, Anish Radhakrishnan Nair 
wrote:
>
> I have to read text from screenshots of speed test results and extract the 
> upload and download speeds from them. Most of the images I have tested have 
> been of very high quality and I have binarized and also corrected skew if 
> necessary, but the results are still only at around 60% accuracy. The 
> biggest issue is that after preprocessing some images in which the numbers 
> are very clearly distinguishable are not read well. As an example, I have 
> attached a test image after preprocessing, and the result of Tesseract 
> performing OCR on it.
>
>
> <https://lh3.googleusercontent.com/-kCTaPk5xzeE/VW6Wk2pyN7I/AAAAAAAAAIQ/D7z6oyM3igA/s1600/bwResult.png>
>
> The result I have received after performing OCR on this picture, in a 
> single line is-
> 000003 4G 15:41 4 83% - / OOKLA SPEEDTEST PWG DOWNLOAD UPLOAD 49 ms Mbps 
> Mbps L,» SHARE ‘ ”‘ “\ ‘ 5M I” 1°“ \\ I 2M 20M ‘ I I 1M 0M | , ‘ ‘ ‘ 
> 1,3,!Ht‘u‘z‘gssz‘:}::;\ ..;~,-. ~‘ ‘ ' 'mmW" 50 ,
>
> Note how the Mbps shows up but the number is completely ignored. How do I 
> improve this result?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9da180a7-96d2-47b8-a827-3f44d9cba8d4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to